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Despite the importance of perceiving and recognizing facial expressions in everyday life, 
there is no comprehensive test battery for the multivariate assessment of these abilities. 
As a first step toward such a compilation, we present 16 tasks that measure the perception 
and recognition of facial emotion expressions, and data illustrating each task's difficulty 
and reliability. The scoring of these tasks focuses on either the speed or accuracy of 
performance. A sample of 269 healthy young adults completed all tasks. In general, 
accuracy and reaction time measures for emotion-general scores showed acceptable 
and high estimates of internal consistency and factor reliability. Emotion-specific scores 
yielded lower reliabilities, yet high enough to encourage further studies with such 
measures. Analyses of task difficulty revealed that all tasks are suitable for measuring 
emotion perception and emotion recognition related abilities in normal populations. 

Keywords: emotion perception, emotion recognition, individual differences, psychometrics, facial expression 



INTRODUCTION 

Facial expressions are indispensable sources of information in 
face-to-face communication (e.g., Todorov, 2011). Thus, a cru- 
cial component of successful personal interactions is to rapidly 
perceive facial expressions and correctly infer others' internal 
states they convey. Although there is on-going debate between 
emotion theorists about the functions, and meanings of facial 
expressions, most contemporary approaches propose that facial 
expressions of emotions are determined by evaluation results and 
represent the efferent effects of the latter on motor behavior 
(cf Scherer et al, 2013). Specific facial expressions are emo- 
tions of the person the face belongs to (Walla and Panksepp, 
2013) and they play a crucial role in emotion communication. 
The perception and identification of emotions from faces pre- 
dicts performance on socio-emotional measures and peer ratings 
of socio-emotional skUls (e.g., Elfenbein and Ambady, 2002a; 
Rubin et al., 2005; Bommer et al, 2009). These measures of 
socio-emotional competences include task demands that ask the 
participant to perceive emotional facial expressions (e.g., Mayer 
et al., 1999). However, the measurement of emotional abilities 
should also include mnestic tasks because facial expressions of 
emotions in face-to-face interactions are often short-lived and 
the judgment of persons may partly rely on retrieval of their 
previous facial expressions. Attempts to measure the recognition 
of previously memorized expressions of emotions are rare. We 
will discuss available tasks for measuring emotion perception and 
recognition from faces and point to some shortcomings regard- 
ing psychometrics and task applicability. After this brief review, 
we wiU suggest theoretical and psychometric criteria for a novel 



task battery designed to measure the accuracy and the speed of 
performance in perceiving and recognizing emotion in faces. 

THE ASSESSMENT OF EMOTION PERCEPTION AND 
RECOGNITION FROM FACES 

The most frequently used task for measuring the perception and 
identification of emotions from faces is the Brief Affect Recognition 
Test (BART; Ekman and Friesen, 1974) and its enhanced ver- 
sion, the Japanese and Caucasian Brief Affective Recognition Test 
(JACBART; Matsumoto et al., 2000). The drawback of these tasks 
is that they present stimuli for a limited time (presentations last 
only two seconds) and therefore stress perceptual speed arguably 
more than measures avoiding such strictly timed stimulus expo- 
sitions (e.g., Hildebrandt et al., 2012). However, if stimuli of the 
BART or JACBART were presented for longer durations, perfor- 
mance on these tasks in unimpaired subjects would show a ceiling 
effect because most adults recognize prototypical expressions of 
the six basic emotions with high confidence and accuracy (Izard, 
1971). 

The Diagnostic Analysis of Nonverbal Accuracy (DANVA; 
Nowicki and Carton, 1993) is frequently used in individual dif- 
ference research (e.g., Elfenbein and Ambady, 2002a; Mayer et al., 
2008). The DANVA stimuli are faces of adults and children dis- 
playing one of four emotional expressions (happiness, sadness, 
anger, and fear) that vary between pictures in their intensity 
levels with the use of variable intensity levels corresponding to 
item difficulty. Therefore, the test can provide adequate per- 
formance scores for emotion recognition ability across a broad 
range of facial characteristics but it relies on a single assessment 
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method (affect naming). Stimulus exposition in the DANVA is 
also speeded. Thus, it is unclear to what extent individual dif- 
ferences in performance are due to the difficulty of recognizing 
expressions of emotions of lower intensity and to what extent they 
are due to the speeded nature of the task. 

Stimuli for the frequently used Profile of Nonverbal Sensitivity 
(PONS; Rosenthal et al., 1979) includes faces, voices, and body 
images and assesses emotion recognition from multimodal and 
dynamic stimuli. The use of dynamic facial stimuli is a pos- 
itive feature of PONS because it ensures a more naturalistic 
setting. One drawback is that it is limited to two emotion cate- 
gories (positive vs. negative affect) and only one method of affect 
naming. 

The Multimodal Emotion Recognition Test (MERT; Banziger 
et al., 2009) has the virtue of using both static and dynamic facial 
stimuli for testing the recognition of 10 emotion expression cat- 
egories. However, MERT is limited by implementing labeling as 
the sole method of affect naming. 

In a recent publication, Palermo et al. (2013) presented two 
novel tests (tasks) developed with the aim of overcoming some of 
the problems of the aforementioned tasks. One task presented by 
Palermo and colleagues was inspired by Herzmann et al. (2008) 
and implemented the odd-man-out paradigm with the aim of 
measuring the perception of facial expressions of emotion. The 
second task is based on the frequently used labeling paradigm 
that captures emotion identification and naming emotion expres- 
sions. Both tasks are based on stimuli presenting six categories 
of emotional expressions (happiness, surprise, fear, sadness, dis- 
gust, and anger) and were shown to capture individual differences 
in performance accuracy; thus, accuracy rates showed no ceiling 
effect in an unimpaired sample of 80 adults with an age range 
of 18^9 years. A further virtue of the test is its two-method 
approach. The test described by Palermo et al. is a first step for 
developing a multivariate task battery for measuring emotion 
perception and identification in non-clinical samples. 

Further task batteries have been described and used in the 
neuropsychological literature and were mainly developed for clin- 
ical purposes. Some examples are the Florida Affect Battery (FAB; 
Bowers et al., 1989) and the Comprehensive Affect Testing System 
(CATS; Froming et al., 2006). The FAB uses facial, vocal, and 
cross-modal stimuli and multiple methods (discrimination, nam- 
ing, selection, and matching); but it uses only static face stimuli. 
The multiple modalities and multiple methods approach of the 
FAB are outstanding features, but an important drawback of FAB 
is the low difficulty for unimpaired subjects (Bowers et al, 1989). 
This limitation also applies to the CATS. 

The revised Reading the Mind in the Eye Test (Baron-Cohen 
et al., 2001), originally intended to measure social sensitivity, 
arguably captures emotion recognition from the eye area only. 
In a four-alternative forced-choice-decision paradigm, stimuli 
depicting static eye regions are used that are assumed to represent 
36 complex mental states (e.g., tentative, hostile, decisive, etc.). 
Target response categories and their foils are of the same valence 
to increase item difficulty. This test aims to measure individual 
differences in recognizing complex affective states (Vellante et al, 
2012), however the test is not appropriate to capture individual 
differences in the perception of emotions because there are no 
unequivocal and veridical solutions for the items. 



Finally, there is a series of experimental paradigms designed 
to measure the identification of emotion in faces (e.g., Kessler 
et al., 2002; Montague et al, 2007). Single task approaches have 
the disadvantage that the measured performance variance due to a 
specific assessment method cannot be accounted for when assess- 
ing ability scores. Below we will mention experimental paradigms 
that were a source of inspiration for us in the process of task 
development when we describe the task battery. 

The literature about mnemonic emotional face tasks is 
sparse as compared with the abundance of emotion identifi- 
cation paradigms. In the memory task described by Hoheisel 
and Kryspin-Exner (2005) — the Vienna Memory of Emotion 
Recognition Tasks (VIEMER) — participants are presented with a 
series of faces showing emotional expressions. The participants' 
task is to memorize the facial identities for later recall. Individual 
faces presented with an emotional expression during the learning 
period are then displayed during the later recall period including 
several target and distracter faces all showing neutral expressions. 
Participants must identify the faces that were seen earlier dur- 
ing the learning period with an emotional expression. This task 
does not measure emotion recognition per se but the interplay of 
identity and expression recognition and does not allow for sta- 
tistical measurement of method variance. Similarly, experimental 
research on memory for emotional faces (e.g., D'Argembeau and 
Van der Linden, 2004; Grady et al, 2007) aimed to investigate the 
effects of emotional expressions on face identity recognition. This 
measure, in addition to the VIEMER, reflects an unknown mix- 
ture of expression and identity recognition. Next, we will briefly 
describe the criteria that guided our development of emotion per- 
ception and recognition tasks. After theoretical considerations for 
the test construction, we will outline psychometric issues. 

THEORIES AND MODELS 

The perception and identification of facially expressed emotions 
has been described as one of the basic abilities located at the 
lowest level of a hierarchical taxonomic model of Emotional 
Intelligence (e.g., Mayer et al, 2008). The mechanisms underly- 
ing the processing of facial identity and expression information 
and their neural correlates have been widely investigated in the 
neuro-cognitive literature. Models of face processing (Bruce and 
Young, 1986; Calder and Young, 2005; Haxby and Gobbini, 201 1) 
delineate stages of processing involved in recognizing two classes 
of facial information: (1) pictorial aspects and invariant facial 
structures that code facial identity and allow for extracting person- 
related knowledge at later processing stages; and (2) changeable 
aspects that provide information for action and emotion under- 
standing (most prominently eye gaze and facial expressions of 
emotion). In their original model, Bruce and Young (1986) sug- 
gested that at an initial stage of structural encoding, during 
which view-centered descriptions are constructed from the retinal 
input, the face processing stream separates into two pathways — 
one being involved in identifying the person and the second 
involved in processing changeable facial information such as facial 
expression or lip speech. Calder (2011) reviewed evidence from 
image-based analyses of faces, experimental effects representing 
similar configural and holistic processing of identity and facial 
expressions, but also neuroimaging and neuropsychological data. 
He concluded that at a perceptual stage there seems to be a 
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partly common processing route for identity and expression- 
related facial information (see also Young and Bruce, 2011). 
Herzmann et al. (2008) published a comprehensive task battery 
for the multivariate measurement of face cognition abilities — 
that is, of identity-related facial information processing. The 
present work aims to complement that task battery with mea- 
sures assessing the ability to perceive and recognize facial emotion 
expressions — that is, abilities regarding the processing of variable 
facial characteristics. 

Prevalent measures of emotion expression perception rely on 
classifying prototypical expression stimuli into emotion cate- 
gories. Under conditions of unlimited time, unimpaired subjects 
frequently perform at ceiling in such tasks. In order to avoid such 
ceiling effects researchers frequently manipulate task difficulty by 
using brief exposition times for stimuli. Such manipulations, if 
done properly, will decrease accuracy rates as desired — but they 
do not eliminate speed-related individual differences. Limited 
exposition times are likely to favor participants high in perceptual 
speed. Difficulty manipulations based on psychological theory 
(e.g., using composites of emotion expressions in stimuli, manip- 
ulate intensity of emotion expression) are conceptually better 
suited for developing novel measures of individual differences in 
unimpaired populations. 

Following functional models of facial emotion processing, we 
define perception of facial emotion expression as the ability to visu- 
ally analyze the configuration of facial muscle orientations and 
movements in order to identify the emotion to which a particular 
expression is most similar. Based upon a well-established distinc- 
tion in intelligence research (Carroll, 1993) we seek to distinguish 
between measures challenging the accuracy and the speed of per- 
formance, respectively. Speed measures of emotion perception are 
designed to capture the swiftness of decisions about facial emo- 
tion expressions and the identification of the emotion of which 
they are associated. Accuracy measures of emotion perception 
should assess the correctness of emotion identification. We define 
memory for facial emotion expressions as the ability to correctly 
encode, store, and retrieve emotional expressions from long- 
term memory. Speeded memory tasks are easy recognition tasks 
that capture the time required to correctly recognize previously 
well-learned emotion expressions. Accuracy based memory tasks 
express the degree to which previously learned emotional faces, 
that were not over learned, are correctly identified during recall. 

DESIDERATA FOR TASK DEVELOPMENT AND 
PSYCHOMETRIC CONSIDERATIONS 

A first crucial requirement on test construction is to base 
the measurement intention on models of the neuro-cognitive 
processes that ought to be measured. Second, an integrative view 
incorporating experimental and individual difference evidence 
is facilitated if the developed experimental task paradigms 
are adapted to psychometric needs (O'SuUivan, 2007; Scherer, 
2007; Wilhelm et al, 2010). Without reliance on basic emotion 
research there is considerable arbitrariness in deriving tasks from 
broad construct definitions. Third, scores from single tasks are 
inadequate manifestations of highly general dispositions. This 
problem is prevalent in experimental approaches to studying 
emotion processing. Other things being equal, a multivariate 



approach to measure cognitive abilities is generally superior to 
task specific measures because it allows for abstracting from 
task specificities. Fourth, assessment tools should be based on a 
broad and profoundly understood stimulus base. Wilhelm (2005) 
pointed to several measurement specificities that are commonly 
treated as irrelevant in measuring emotional abilities. Specifically, 
generalizations from a very restricted number of stimuli are 
a neglected concern (for a more general discussion, see ludd 
et al, 2012). O'SuUivan (2007) emphasized the impact of a pro- 
found understanding of stimulus characteristics for measuring 
emotional abilities and conjectured that this understanding is 
inadequate for most of the available measures. 

The following presented work describes conceptual and psy- 
chometric features of a multivariate test battery. We assessed the 
difficulty and the psychometric quality of a broad variety of per- 
formance indicators that can be derived on the basis of 16 tasks 
for measuring the accuracy or speed of the perception or recog- 
nition of facially expressed emotions. Then, the battery will be 
evaluated and applied in subsequent studies. All tasks are avail- 
able for non-commercial research purposes upon request from 
the corresponding author. 

METHODS 
SAMPLE 

A total of 273 young adults (who reported to have no psychi- 
atric disorders), between 18 and 35 years of age, participated in 
the study. They all lived in the Berlin area and self-identified as 
Caucasian. Participants were contacted via newspaper advertise- 
ments, posters, flyers, and databases of potential participants. Due 
to technical problems and dropouts between testing sessions, four 
participants had missing values for more than five tasks and were 
excluded from the analyses. The final sample included 269 par- 
ticipants (52% females). Their mean age was 26 years (SD = 6). 
Their educational background was heterogeneous: 26.8% did 
not have degrees qualifying for college education, 62.5% had 
only high school degrees, and 10.7% held academic degrees (i.e., 
some sort of college education). All participants had normal or 
corrected-to-normal visual acuity. 

DEVELOPMENT OF THE EMOTIONAL FACE STIMULI DATABASE USED 
FOR THE TASKS 
Photo shooting 

Pictures were taken in individual photo sessions with 145 
(72 males) Caucasian adults ranging in age from 18 to 35 
years. Models were recruited via newspaper advertisements. 
Photographs were taken with similar lighting and identical back- 
ground conditions. Models did not wear makeup, piercings, or 
beards. Glasses were removed during the shooting and when 
needed, hair was fixed outside the facial area. In order to elicit 
emotional expressions, we followed the procedure described by 
Ebner et al. (2010). Each photography session consisted of three 
phases: emotion induction, personal experiences, and imitation. 
Photographs were taken continuously through all three phases, 
and at least 150 pictures of each person were stored. Each expo- 
sure was taken from three perspectives (frontal and right and 
left three-quarter views) with synchronized cameras (Nikon D- 
SLR, D5000) from a distance of 3 meters. From this pool of 
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145 faces, 122 face identities (50% females) in total were used as 
stimuli across all tasks. The pictures were selected according to 
their photographic quality and expression quality evaluated by a 
trained researcher and FaceReader software codes (see validation 
below). 

Expression elicitation was structured into three phases; the 
three phases were completed for one emotion before starting with 
the production of the next emotional expression. The sequence 
was: neutral, sadness, disgust, fear, happiness, anger, and surprise. 
The first expression elicitation phase was the Emotion Induction 
Phase. We elicited emotional expressions by a subset of 16 pictures 
from the International Affective Picture System (lAPS; Lang et al, 
2008) that were presented one after another by being projected on 
a back wall. Models were asked to carefully look at the pictures, 
identify which emotion the picture elicited in them, and display 
that emotion in their face with the intention to communicate it 
spontaneously. Models were also instructed to communicate the 
emotion with their face at a level of intensity that would make a 
person not seeing the stimulus picture understand what emotion 
the picture elicited in them. We used four neutral, five sadness, six 
disgust, seven fear, and four happiness inducing lAPS pictures. 
Since there are no lAPS pictures for anger, the induction phase 
for anger started with the second phase. After all pictures within 
one emotion were presented continuously, models were asked 
to choose one of the pictures for closer inspection. During the 
inspection of the selected picture further photographs were taken. 

The second expression elicitation phase was the Personal 
Experience Phase. Models were asked to imagine a personally rel- 
evant episode of their lives in which they strongly experienced 
a certain emotional state corresponding to one of the six emo- 
tions (happiness, surprise, fear, sadness, disgust, and anger). The 
instructions were the same as in the first phase: communicate an 
emotional expression with the face so that a second person would 
understand the expressed feeling. 

The third expression elicitation phase was the Imitation Phase. 
Models were instructed by written and spoken instructions based 
on emotion descriptions according to Ekman and Friesen (1976) 
regarding how to exert the specific muscular activities required 
for expressing the six emotions in the face. In contrast to the 
previous phases, no emotional involvement was necessary in the 
imitation part. Models were guided to focus on the relevant areas 
around the eyes, the nose, and the mouth and instructed on how 
to activate these regions in order to specifically express one of 
the six basic emotions. During this phase photographers contin- 
uously supported the models by providing them with feedback. 
Models were also requested to inspect their facial expression in a 
mirror. They had the opportunity to compare their own expres- 
sion with the presented expression model from the database by 
Ekman and Friesen (1976) and to synchronize their expression 
with a projected prototypical emotion portrait. 

Validation of facial expressions 

First, trained researchers and student assistants selected those 
pictures that had an acceptable photographic quality. From 
all selected pictures those that clearly expressed the intended 
emotion, including low intensity pictures, were selected for 
aU models. Facial expressions were coded regarding the 



target emotional expression along with the other five basic 
emotion expressions and neutral expressions using FaceReader 
3 software (http://www.noldus.com/webfm_send/569). Based on 
FaceReader 3 emotion ratings, the stimuli were assigned to the 
tasks described below. Overall accuracy rate of FaceReader 3 at 
classifying expressions of younger adults is estimated 0.89 and 
classification performance for separate emotion categories are as 
follows: Happiness 0.97; Surprise 0.85; Fear 0.93; Sadness 0.85; 
Disgust 0.88; and Anger 0.80 (Den Uyl and van Kuilenburg, 
2005). 

Editing 

All final portraits were converted to grayscale and fitted with a 
standardized head-size into a vertical elliptical frame of 200 x 300 
pixels. During post-processing of the images, differences in skin 
texture were adjusted and non-facial cues, like ears, hair and 
clothing, were eliminated. Physical attributes like luminance and 
contrast were held constant across images. Each task was balanced 
with an equal number of female and male stimuli. Whenever two 
different identities were simultaneously presented in a given trial, 
portraits of same sex models were used. 

GENERAL PROCEDURE 

All tasks were administered by trained proctors in group-sessions 
with up to 10 participants. There were three sessions for every 
participant, each lasting about three hours, including two breaks 
of 10 min. Sessions were completed in approximately weekly 
intervals. Both task and trial sequences were kept constant across 
all participants. Computers with 17-inch monitors (screen def- 
inition: 1366 x 768 pixel; refresh rate: 60 Hz) were used for 
task administration. The tasks were programmed in Inquisit 3.2 
(Millisecond Software). Each task started at the same time for all 
participants in a given group. In general, participants were asked 
to work to the best of their ability as quickly as possible. They 
were instructed to use the left and right index fingers during tasks 
that used two response options and to keep the fingers positioned 
directly above the relevant keys throughout the whole task. Tasks 
with four response options were organized such that the partici- 
pant only used the index finger of a preferred hand. Every single 
task was introduced by proctors and additional instructions were 
provide on screen. There were short practice blocks in each task 
consisting of at least 5 and at most 10 trials (depending on task 
difficulty) with trial-by-trial feedback about accuracy. There was 
no feedback for any of the test trials. Table 1 gives an overview of 
the tasks included in the task battery. 

DATA TREATMENT AND STATISTICAL ANALYSES 

The final dataset (N = 269) was visually screened for outliers in 
uni- and bivariate distributions. Outliers in univariate distribu- 
tions were set to missing. For the approximately 0.2% of missing 
values after outlier elimination a multiple random imputation 
(e.g., Allison, 2001) was conducted. With this procedure, plausi- 
ble values were computed as predicted values for missing observa- 
tions plus a random draw from the residual normal distribution 
of the respective variable. One of the multiple datasets was used 
for the analyses reported here. Results were verified and do not 
differ from datasets obtained through multiple imputation with 
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Table 1 | Overview of the tasks. 



Task 


Name of the task 


Ability domain 


Duration In min. 


# of Blocks/Trials 


# of Faces 


1 


Identification of emotion expressions from composite faces 


EP 


10 


1/72 


8 


2 


Identification of emotion expressions of different intensity from 
uprigint and inverted dynamic face stimuli 


EP 


12 


1/72 


12 


3 


Visual search for faces with corresponding emotion expressions of 
different intensity 


EP 


17 


1/40 


4 


4 


Emotion hexagon — identification of mix-ratios in expression continua 


EP 


15 


1/60 


10 


5 


Emotion hexagon — discrimination 


EP 


10 


1/60 


10 


6 


Learning and recognition of emotion expressions of different 
intensity 


EM 


18 


4/72 


4 


7 


Learning and recognition of emotional expressions from different 
viewpoints 


EM 


15 


4/56 


4 


8 


Learning and recognition of mixed emotion expressions in 


EM 


15 


4/56 


4 




expression morphs 










9 


Cued emotional expressions span 


EM 


10 


7/32 


4 


10 


Memory for facial expressions of emotions 


EM 


10 


4/27 


4 


11 


Emotion perception from different viewpoints 


SoEP 


8 


1/31 


14 


12 


Emotional odd-man-out 


SoEP 


6 


1/30 


18 


13 


Identification speed of emotional expressions 


SoEP 


10 


1/48 


8 


14 


7-back recognition speed of emotional expressions 


SoEM 


8 


4/24 


4 


15 


Delayed non-matching to sample with emotional expressions 


SoEM 


10 


1/36 


18 


16 


Recognition speed of morphed emotional expressions 


SoEM 


6 


6/36 


6 



ER emotion perception; EM, emotion memory; SoEF^ speed of emotion perception; SoEIVI, speed of emotion memory; the expected predominant source of 
performance variability is ttie performance accuracy for the tasks 1-10 and the response latency (speed) for the tasks 11-16; Duration in min. includes instruction 
time and practice; # of Blocks/Thais: number of blocks and the sum of trials across block included in the task; # of Faces: number of face identities used to design a 
specific task; the total number of face identities used in the task battery is 145; the total duration is 180 min. 



the R package mice, by van Buuren and Groothuis-Oudshoorn 
(2011). 

Reaction time (RT) scores were only computed from cor- 
rect responses. RTs smaller than 200 ms were set to missing, 
because they were considered too short to represent proper pro- 
cessing. The remaining RTs were winsorized (e.g., Barnett and 
Lewis, 1978); that is, RTs longer than 3 SDs above the indi- 
vidual mean were fixed to the individual mean RT plus 3 SD. 
This procedure was repeated iteratively beginning with the slow- 
est response until there were no more RTs above the criterion 
of 3 SD. 

All analyses were conducted with the statistical software envi- 
ronment R. Repeated measures ANOVAs (rmANOVA) were per- 
formed with the package ez (Lawrence, 2011) and reliability 
estimates with the package ps/c?; (Revelle, 2013). 

SCORING 

For accuracy tasks, we defined the proportion of correctly solved 
trials of an experimental condition of interest (e.g., emotion cate- 
gory, expression intensity, presentation mode) as the performance 
indicator. For some of these tasks we applied additional scoring 
procedures as indicated in the corresponding task description. 
Speed indicators were average inverted RTs (measures in seconds) 
obtained across all correct responses associated with the trials 
from the experimental conditions of interest. Note that accu- 
racy was expected to be at ceiling in measures of speed. Inverted 
latency was calculated as 1000 divided by the RT in milliseconds. 



PERCEPTION AND IDENTIFICATION TASKS OF FACIAL 
EMOTION EXPRESSIONS 

TASK 1: IDENTIFICATION OF EMOTION EXPRESSIONS FROM 
COMPOSITE FACES 

Calder et al. (2000) proposed the Composite Face Paradigm 
(e.g., Young et al., 1987) for investigating perceptual mecha- 
nisms underlying facial expression processing and particularly for 
studying the role of configural information in expression percep- 
tion. Composite facial expressions were created by aligning the 
upper and the lower face half of the same person, but from pho- 
tos with different emotional expressions, so that in the final photo 
each face was expressing an emotion in the upper half of the 
face that differed from the emotion expressed in the lower half 
of the face. Aligned face halves of incongruent expressions lead to 
holistic interference. 

It has been shown that an emotion expressed in only one face 
half is less accurately recognized compared to congruent emo- 
tional expressions in face composites (e.g., Tanaka et al., 2012). 
In order to avoid ceiling effects, as is common for the perception 
of emotions from prototypical expressions, we took advantage of 
the higher task difficulty imposed by combining different facial 
expressions in the top and bottom halves of faces, and exploited 
the differential importance of the top and bottom face for the 
recognition of specific emotions (Ekman et al., 1972; Bassili, 
1979). Specifically, fear, sadness, and anger are more readily rec- 
ognized in the top half of the face and happiness, surprise, and 
disgust in the bottom half of the face (Calder et al., 2000). Here, 
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we used the more readily recognizable halves for the target halves 
in order to ensure acceptable performance. Top halves express- 
ing fear, sadness, or anger were only combined with bottom 
halves expressing disgust, happiness, or surprise — yielding nine 
different composites (see Figure 1 for examples of all possible 
composite expression stimuli of a female model). 

Procedure 

After the instruction and nine practice trials, 72 experimental 
trials were administered. The trial sequence was random across 
the nine different emotion composites. Pictures with emotional 
expressions of four female and four male models were used to 
create the 72 emotion composites. For each model, nine aligned 
composite faces were created. In each trial, following a fixation 
cross, a composite face was presented in the center of the screen. 
The prompt words "TOP" or "BOTTOM" were shown above 
the composite face to indicate for which face half the expression 
should be classified; the other half of the face was to be ignored. 
Six labeled "buttons" (from left to right: "happiness," "surprise," 
"anger," "fear," "sadness," "disgust") were aligned in a horizon- 
tal row on the screen below the stimuli. Participants were asked 



Angry - Disgust Angry - Happiness Angry - Surprise 




Fear - Disgust 



Fear - Happiness Fear - Surprise 





Sadness - Disgust 



Sadness - Happiness 



Sadness - Surprise 



FIGURE 1 I Stimuli examples used in Task 1 (Identification of emotion 
expression from composite faces). 



to click with a computer mouse the button corresponding to the 
emotion in the prompted face half. After the button was clicked 
the face disappeared and the screen remained blank for 500 ms; 
then the next trial started with the fixation cross. 

Scoring 

In addition to the proportion of correct responses across a series 
of 72 trials, we calculated unbiased hit rates (Hu, Wagner, 1993). 
Unbiased hit rates account for response biases toward a specific 
category and correct for systematic confusions between emo- 
tion categories. For a specific emotion score H„ was calculated 
as squared frequency of the correct classifications divided by the 
product of the number of stimuli used for the different emo- 
tion categories and the overall frequency of choices for the target 
emotion category. We report difficulty estimates for both percent 
correct and Hj,. 

Results and discussion 

Table 2 summarizes performance accuracy across all adminis- 
tered trials and for specific emotional expressions along with 
reliability estimates computed with Cronbach's Alpha (a) and 
Omega (co; McDonald, 1999). We calculated reliabilities on 
the basis of percent correct scores. Difficulty estimates in 
Table 2 based on percent correct scores show that performance 
was not at ceiling. The distributions across persons for the 
happiness, surprise, and anger trials were negatively skewed 
(—1.61, —0.87, —1.05), suggesting a somewhat censored distri- 
butions to the right, but for no participant was accuracy at 
ceiling. In an rmANOVA the emotion category showed a strong 
main effect: [f(5, 1340) = 224.40, p < 0.001, t)^ = 036]. Post-hoc 
analyses indicate happiness was recognized the best, followed by 
surprise, anger, disgust, fear, and sadness. This ranking was sim- 
ilar for Hi, scores (see Table 2). However, when response biases 
were controlled for, anger was recognized better than surprise. 
Percent correct and H„ scores across all trials were correlated 0.99 
{p < 0.001), indicating that the scoring procedure do not notably 
affect the rank order of persons. 

Reliability estimates across aU trials were very good and across 
all trials for a single emotion, considering the low number of 
trials for single emotions and the unavoidable heterogeneity of 
facial stimuli, were satisfactory (ranging between 0.59 and 0.75). 
Difficulty estimates suggest that performance across persons was 
not at ceiling. The psychometric quality of single emotion expres- 
sion scores and performance on the overall measure are satisfac- 
tory to high. Adding more trials to the task could further increase 
the reliability of the emotion specific performance indicators. 

TASK 2: IDENTIFICATION OF EMOTION EXPRESSIONS OF DIFFERENT 
INTENSITY FROM UPRIGHT AND INVERTED DYNAMIC FACES 

Motion facilitates emotion recognition from faces (e.g., Wehrle 
et al, 2000; Recio et al, 2011). Kamachi et al. (2001) used mor- 
phed videos simulating the dynamics of emotion expressions and 
showed that they are partly encoded on the basis of static infor- 
mation but also from motion-related cues. Ambadar et al. (2005) 
demonstrated that facial motion also promotes the identifica- 
tion accuracy of subtle, less intense emotion displays. In Task 2, 
we used dynamic stimuli in order to extend the measurement 
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Table 2 | Descriptive statistics and reliability estimates of performance accuracy for all emotion perception tasks across all trials and for single 
target emotions. 



Condition 


Accuracy M {SD, SE) Alternative score M {SD, SE) 


Alpha / Omega / # of trials 




TASK 1: IDENTIFICATION OF EMOTION EXPRESSIONS FROM COMPOSITE FACES ^mm^^^^^^^^^k fl^H 


Overall (range: 0.37-0.89) 


0.66 (0.11, 0.01) 0.47 (0.14, 0.01)** 


0.81/0.81/72 


Happiness (1 *) 


0.84(0.19,0.01) 0.59(0.19,0.01) 


0.74/0.75/12 


Surprise (2) 


0.78 (0.20, 0.01) 0.51 (0.19, 0.01) 


0.73/0.73/12 


Fear (5) 


0.49(0.20,0.01) 0.30(0.17 0.01) 


0.61/0.61/12 


Sadness (6) 


0.45 (0.22, 0.01) 0.31 (0.19, 0.01) 


0.69/0.70/12 


Disgust (4) 


0.66 (0.19, 0.01) 0.51 (0.21, 0.01) 


0.67/0.67/12 


Anger (3) 


0.76(0.17 0.01) 0.59(0.20,0.01) 


0.59/0.59/12 


TASK 2: IDENTIFICATION OF EMOTION EXPRESSIONS FROM UPRIGHT AND INVERTED DYNAMIC FACES ^^^^^^^^^^^^^^M 


Overall (range: 0.46-0.85) 


0.68 (0.07 0.00) 0.48 (0.09, 0.01)** 


0.62/0.62/72 


Happiness (1 *) 


0.94(0.07 0.00) 0.76(0.13,0.01) 


0.23/0.32/09*** 


Surprise (2) 


0.83(0.13,0.01) 0.59(0.13,0.01) 


0.61/0.63/12 


Fear (6) 


0.49 (0.20, 0.01) 0.31 (0.17 0.01) 


0.61/0.62/13 


Sadness (5) 


0.55(0.19,0.01) 0.40(0.15,0.01) 


0.55/0.56/12 


Disgust (3) 


0.67(0.19,0.01) 0.43(0.16,0.01) 


0.65/0.65/12 


Anger (4) 


0.63(0.14,0.01) 0.42(0.12,0.01) 


0.29/0.31/11 




Overall (range: 0.24-0.94) 


0.76(0.14,0.01) 4.54(0.70,0.04) 


0.86/0.87/40 


Surprise (1 *) 


0.89(0.15,0.01) 5.42(0.87 0.05) 


0.60/0.61/08 


Fear (5) 


0.60 (0.22, 0.01 ) 4.03 (1.07 0.07) 


0.47/0.48/08 


Sadness (3) 


0.82(0.17 0.01) 5.00(0.91,0.06) 


0.62/0.63/08 


Disgust (2) 


0.86 (0.19, 0.01 ) 5.42 (0.95, 0.06) 


0.64/0.64/08 


Anger (4) 


0.62 (0.22, 0.01 ) 3.41 (0.94, 0.06) 


0.53/0.54/08 


TASK 4: EMOTION HEXAGON- 


-IDENTIFICATION OF MIX-RATIOS IN EXPRESSION CONTINUA 




Overall (range: 8.26-60.51) 


14.29(5.38,0.33)**** 


0.93/0.94/60 


Happiness (1 *) 


11.67 (5.96, 0.36) 


0.78/0.80/10 


Surprise (5) 


15.36 (5.67 0.35) 


0.66/0.69/10 


Fear (4) 


15.27(6.35,0.39) 


0.63/0.66/10 


Sadness (6) 


16.82(5.89,0.36) 


0.61/0.64/10 


Disgust (3) 


14.15(6.03,0.37) 


0.69/0.71/10 


Anger (2) 


12.48 (6.47 0.39) 


0.75/0.78/10 


TASK 5: EMOTION HEXAGON- 






Overall (range: 0.62-0.92) 


0.80 (0.06, 0.00) 


0.63/0.64/60 


Happiness (1 *) 


0.90(0.11,0.01) 


0.39/0.44/10 


Surprise (3) 


0.78(0.13,0.01) 


0.24/0.26/10 


Fear (2) 


0.81(0.12,0.01) 


0.27/0.28/10 


Sadness (5) 


0.64(0.16,0.01) 


0.33/0.35/10 


Disgust (1) 


0.90(0.10,0.01) 


0.13/0.21/10 


Anger (4) 


0.76(0.15,0.01) 


0.45/0.47/10 


Note. M, means, SE, standard errors, SD, standard deviations; Aiptia (a) and Omega (id) coefficients; # of Trials, the number of trials used for calculating the reliability 


coefficients; task names are shortened for this table; * the rank order of recognizability across emotions is indicated in the brackets; 


•*, Unbiased Hit Rate (Wagner, 


1993); ***, there was no variance 


in three items because they have been correctly solved by all subjects, reliability estimates are based on 9 out of 12 trials 


displaying facial expressions of happiness; score: amount of deviation of participants' response from the correct proportion of the mixture between the fwo 


parent expressions; the chance probability in case of Task 1 and 2 is 0. 16 and 0.50 for Task 5; the chance probability is no relevant measure for Task 3 and 4. 



of emotion identification to more real life-like situations and to 
ensure adequate construct representation of the final task battery 
(Embretson, 1983). 

Because previous findings predict higher accuracy rates for 
emotion identification from dynamic stimuli, we implemented 
intensity manipulations in order to avoid ceiling effects. Hess 
et al. (1997) investigated whether the intensity of a facial emotion 



expression is a function of muscle displacement compared with 
a neutral expression and reported decreased accuracy rates for 
static expression morphs of lower expression intensity. We gener- 
ated expression-end-states by morphing intermediate expressions 
between a neutral and an emotional face. Mixture ratios for the 
morphs aimed at three intensity levels by decreasing the pro- 
portion of neutral relative to the full emotion expressions from 
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60:40% (low intensity) to 40:60% (middle) to 20:80% (high 
intensity). 

In order to capture the contrast between configural vs. feature- 
based processing of facial expressions, we also included stimulus 
orientation manipulations (upright vs. inverted). Face inversion 
strongly impedes holistic processing, allowing mainly feature- 
based processing (Calder et al., 2000). McKelvie (1995) indicated 
an increase of errors and RTs of emotion perception from static 
faces presented upside-down and similar findings were reported 
for dynamic stimuli as well (Ambadar et al, 2005). 

Procedure 

Short videos (picture size was 200 x 300 pixel) displaying 30 
frames per second were presented in the center of the screen. 
The first frame of the video displayed a neutral facial expres- 
sion that, across the subsequent frames, changed to an emotional 
facial expression. The videos ended at 500 ms and the peak expres- 
sion displayed in the last frame remained on the screen until 
the categorization was performed. Emotion label buttons were 
the same as in the previous task. We varied expression intensity 
across trials, with one third of the trials for each intensity level. 
The morphing procedure was similar to the procedure used in 
previous studies (e.g., Kessler et al., 2005; Montagne et al, 2007; 
Hoffmann et al., 2010) and included two steps. First, static pic- 
tures were generated by morphing a neutral expression image of 
a face model with the images of the same person showing one 
of the 6 basic emotions; mixture ratios were 40, 60, or 80 per- 
cent of the emotional face. Second, short video sequences were 
produced on the basis of a morphed sequence of frames starting 
from a neutral expression and ending with one of emotional faces 
generated in the first step. Thus, video sequences were created for 
all three intensities; this was done separately for two female and 
two male models. Half of the 72 trials were presented upright and 
the other presented upside down. Following the instructions par- 
ticipants completed four practice trials. The experimental trials 
with varying conditions (upright vs. upside-down), basic emo- 
tion, and intensity were presented in pseudo-randomized order 
but fixed across participants. 

Results and discussion 

In addition to results for the percent correct scores, we also report 
unbiased hit rates (see above). Table 2 summarizes the average 
performance calculated for both, percent correct and unbiased 
hit rates (the scores are correlated 0.98) along with reliability 
estimates, which were all acceptable, except the low omega for 
anger recognition. It seems that the facial expressions of anger 
used here were particularly heterogeneous. There were no ceil- 
ing effects in any of the indicators. An rmANOVA with factors 
for emotion expression and expression intensity revealed main 
effects for both. Emotion expression explained 34% of the vari- 
ance in recognition rates, [F^^ 1340) = 327.87, p < 0.001, = 
0.34] whereas the intensity effect was small [F(2, 536) = 17.98, 
p < 0.001, = 0.01]. The rank order of recognizability of dif- 
ferent emotional expressions was comparable with Task 1, which 
used expression composites (cf. Figures 2A,B). Happiness and 
surprise were recognized the best, followed by anger and dis- 
gust, and finally sadness and fear were the most difficult. An 



interaction of emotion expression and intensity, [-F(io, 268O) = 
96.94, p < 0.001, Ti^ = 0.13], may indicate that expression peaks 
of face prototypes used for morphing varied in their intensity 
between models and emotions. Scores calculated across all trials 
within single emotions disregarding the intensity manipulation 
had acceptable or good psychometric quality. 

TASK 3: VISUAL SEARCH FOR FACES WITH CORRESPONDING EMOTION 
EXPRESSIONS OF DIFFERENT INTENSITY 

Task 3 was inspired by the visual search paradigm often imple- 
mented for investigating attention biases to emotional faces (e.g., 
Frischen et al, 2008). In general, visual search tasks require the 
identification of a target object that differs in at least one feature 
(e.g., orientation, distance, color, or content) from non-target 
objects displayed at the same time. In this task, participants had 
to recognize several target facial expressions that differed from a 
prevailing emotion expression. Usually, reaction time slopes are 
inspected as dependent performance variables in visual search 
tasks. However, we set no limits on response time and encouraged 
participants to screen and correct their responses before confirm- 
ing their choice. This way we aimed to minimize the influence 
of visual saliency of different emotions on the search efficiency 
due to pre-attentive processes (Calvo and Nummenmaa, 2008) 
and capture intentional processing instead. This task assessed 
the ability to discriminate between different emotional facial 
expressions. 

Procedure 

In each trial, a set of nine images of the same identity was pre- 
sented simultaneously, arranged in a 3 x 3 grid. The majority 
of the images displayed one emotional expression (surprise, fear, 
sadness, disgust, or anger) referred to here as the target expres- 
sion. In each trial participants were asked to identify the neutral 
and emotional expressions. Experimental manipulations incor- 
porated in each trial were: (1) choice of distracter emotion expres- 
sion, (2) the number of distracter emotion expressions — ranging 
from 1 to 4, and (3) the target expression. Happiness expressions 
were not used in this task because performance for smiling faces 
was assumed to be at ceiling due to pop out effects. The loca- 
tion of target stimuli within the grid was pseudo-randomized. 
Reminders at the top of the screen informed participants of 
the number of distracters to be detected in a given trial (see 
Figures for an example). Participants' task was to identify and 
indicate all distracter expressions by clicking with their mouse 
a tick box below each stimulus. It was possible to review and 
correct all answers before submitting one's response; participants 
confirmed their responses by clicking the "next" button starting 
the next trial. The task aimed to implement two levels of diffi- 
culty by using target and distracter expressions with low and high 
intensity. 

All 360 stimuli were different images originating from 
four models (two females and two males). Intensity level 
was assessed with the FaceReader software. Based on these 
intensity levels, trials were composed of either low or high 
intense emotion stimuli for targets as well as for dis- 
tracters within the same trial. The number of divergent 
expressions to be identified was distributed uniformly across 
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FIGURE 2 I Plots of the rank order of recognizabllity of the 
different emotion categories esteemed in emotion perception task. 

(A) Task 1, Identification of Emotion Expression from composite 
faces; (B) Task 2, Identification of Emotion Expression of different 



intensity from upright and inverted dynamic face stimuli; (C) 

Task 3, Visual search for faces with corresponding Emotion 

Expression of different intensity, error bars represent confidence 
intervals. 



Please find 4 expressions which deviate from the majority of expressions! 
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FIGURE 3 I Schematic representation of a trail from Task 3 (Visual 
search for faces with corresponding emotion expression of different 
intensity). 



conditions. There were 40 experimental trials administered 
after three practice trials, which followed the instructions. The 
accuracies of the multiple answers for a trial are dependent 
variables. 



Scoring 

We applied three different scoring procedures. The first was based 
on the proportion of correctly recognized targets. This procedure 
only accounts for the hit rates, disregards false alarms, and can 
be used to evaluate the detection rate of target facial expressions. 
For the second, we computed a difference score between the hit- 
rate and false-alarm rate for each trial. This score is an indicator 
of the ability to recognize distracter expressions. For the third, 
we calculated d'prime-scores [Z(hit rate) — Z(false-alarm rate)] 
for each trial separately. The average correlation between the three 
scores was r = 0.96 (p < 0.001), suggesting that the rank order of 
individual differences was practically invariant across scoring pro- 
cedures. Next, we will report proportion correct scores. Table 2 
additionally displays average performance based on the d'prime 
scores. 

Results and discussion 

The univariate distributions of emotion-specific performance 
indicators and the average performance — displayed in Table 2 — 
suggest substantial individual differences in accuracy measures. 
The task design was successful at avoiding ceiling effects fre- 
quently observed for recognition performance of prototypical 
expressions. This was presumably achieved by using stimuli of 
varying expression intensity and by the increasing number of dis- 
tracters across trials. Reliability estimates of the overall score were 
excellent (a = 0.86; oo = 0.87). Considering that only eight trials 
entered the emotion specific scores and that emotional expres- 
sions are rather heterogeneous, reliability estimates (ranging from 
0.48-0.64) are satisfactory. 

An rmANOVA with two within subject factors, emotional 
expression and difficulty (high vs. low expression intensity), 
revealed that the expressed emotion explained 21% of the vari- 
ance of recognition rates, [-F(4_ 1072) = 244.86, p < 0.001, = 
0 .2 1 ]. The rank orders of recognizabllity of the emotion categories 
were slightly different from those estimated in Task 1 and 2 (see 
Figures 2C,B compared with Figures 2A,B). Surprised faces were 
recognized the best, as was the case for Task 2. Anger faces were 
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recognized considerably worse than sadness faces. This inconsis- 
tency might be due to effects of stimulus sampling. Performance 
on fear expressions was the poorest. 

The difficulty manipulation based on high vs. low intensity of 
the target emotional expression, as well as intensity of distracter 
expressions, was successful as expressed by the main effect 
of difficulty in the rmANOVA, 268)= 638.26, p < 0.001, 
T)^ = 0.16]. There was a significant interaction between inten- 
sity and emotion category [_F(4^ 1072) = 100.82, p < 0.001, 

= 0.09], were more intense expressions were recognized 
better within each expression category but to a different degree. 
The ratios of the difference between low and high intensity 
conditions varied across emotions: surprise — ^Measy = 0.93, 
Mdifficult = 0.84 [f(268) = 8.05, p < 0.001]; fear— Measy = 0.83, 
Mdifficult = 0.37 [f(268) = 21.96, p < 0.001]; sadness— 
Measy = 0.88, Mdifficult = 0.76 [f(268) = 9.81, p < 0.001]; 

disgust— Measy = 0.89, Mdifficult = 0.82 [f(268) = 4.62, 

p < 0.001]; and anger— Measy = 0.77, Mdifficult = 0.45 
[f(268) = 13.93, p < 0.001]. We conclude that performance 
indicators derived from this task have acceptable psychometric 
quality. Empirical difficulty levels differ across the intended 
manipulations based on expression intensity and the task 
revealed a rank order of recognizability similar to other tasks 
used in this study. The scoring procedure hardly affected the rank 
order of persons, allowing the conclusion that different scores 
derived from this task express the same emotional expression 
discrimination ability. 

TASK 4: EMOTION HEXAGON— IDENTIFICATION OF MIX-RATIOS IN 
EXPRESSION CONTINUA 

It is suggested that the encoding of facial emotion expressions 
is based on discrete categorical (qualitative) matching (Etcoff 
and Magee, 1992; Calder et al., 1996), but also on the multidi- 
mensional perception of continuous information (Russell, 1980). 
There is evidence that both types of perception are integrated 
and used complementary (Fujimura et al, 2012). In this task, 
we required participants to determine the mixture ratios of two 
prototypical expressions of emotions. In order to avoid memory- 
related processes we constructed a simultaneous matching task. 
We morphed expressions of two emotions along a continuum 
of 10 mixture ratios. We only morphed continua between adja- 
cent emotions on a so-called emotion hexagon (with the sequence 
happiness-surprise-fear-sadness-disgust-anger), where proximity 
of emotions represents potentially stronger confusion between 
expressions (e.g., Calder et al., 1996; Sprengelmeyer et al, 1996). 
In terms of categorical perception, there should be an advantage 
in identifying the correct mixture-ratio at the end of a contin- 
uum compared with more balanced stimuli in the middle of 
the continuum between two expression categories (Calder et al., 
1996). 

Procedure 

Morphed images were created from two different expressions 
with theoretically postulated and empirically tested maximal 
confusion rates (Ekman and Friesen, 1976). Thus, morphs 
were created on the following six continua: happiness-surprise, 
surprise-fear, fear-sadness, sadness-disgust, disgust-anger, and 



anger-happiness. The mixture ratios were composed in 10% steps 
from 95:5 to 5:95. These morphs were created for each face 
separately for five female and five male models. 

In every trial, two images of the same identity were presented 
on the upper left and on the upper right side of the screen, where 
each image displayed a different prototypical emotion expres- 
sion (happiness, surprise, fear, sadness, disgust, and anger). Below 
these faces, centered on the screen, was a single expression mor- 
phed from the prototypical faces displayed in the upper part of the 
screen. All three faces remained on the screen until participants 
responded. Participants were asked to estimate the ratio of the 
morphed photo on a continuous visual analog scale. Participants 
were then instructed that the left and right ends of the scale rep- 
resent a 100% agreement respectively with the images presented 
in the upper left and upper right side of the screen, and the mid- 
dle of the scale represents a proportion of 50:50 from both parent 
faces. Participants were asked to estimate the mixture-ratio of the 
morph photo as exactly as possible, using the full range of the 
scale. There were no time limits. Three practice trials preceded 60 
experimental trials. We scored performance accuracy as the aver- 
age absolute deviation of participants' response from the correct 
proportion of the mixture between the two parent expressions. 

Results and discussion 

Table 2 displays the average overall and emotion specific devia- 
tion scores. An rmANOVA revealed that the emotion combina- 
tions used in this task were less influential than in other tasks, 
[^■(5, 1340) = 106.27, p < 0.001, Ti^ = 0.08]. Reliability estimates 
were excellent for the overall score (a = 0.93; co = 0.94) and sat- 
isfactory for emotion-specific scores (oj ranged between 0.64 and 
0.80). Further, it was interesting to investigate whether perfor- 
mance was higher toward the ends of the continua as predicted 
by categorical accounts of emotional expression perception. An 
rmANOVA with the within-subject factor mixture ratio (levels: 
95, 85, 75, 65, and 55% of the prevailing emotional expres- 
sion) showed a significant effect, [_F(4 1072) = 85.27, p < 0.001, 
f]^ = 0.13]. As expected, deviation scores were lowest at mixture 
ratios of 95% of a parent expression and increased with decreasing 
contributions of the prevailing emotion: Mgs% = 9.04, Ms5% = 
13.33, M75o/„ = 15.89, M65% = 17.11, Msso/^ = 16.09. There was 
no significant difference between the mixture levels 75, 65, and 
55% of the target parent expression. A series of two-tailed paired 
t-tests compared differences between the emotion categories of 
the parent photo. The correct mixture ratio was better identi- 
fied in the following combinations: performance in happiness 
with surprise combinations was slightly better than combina- 
tions of happiness with anger, [f(268) = 1-78, p = 0.08]; surprise 
with happiness was easier to identify than surprise with fear, 
[f(268) = 12.23, p < 0.001]; fear with sadness better than with 
surprise, [f(268) = 9.67, p < 0.001]; disgust with sadness better 
than with anger, [f(268) = 7.93, p < 0.001]; and anger with hap- 
piness better than with disgust, [f(268) = 4.06, p < 0.001]. For 
sadness there was no difference between fear and disgust, [t(268) = 
0.37, p = 0.36]. Generally, we expected mixtures of more simi- 
lar expressions to bias the evaluation of the morphs. The results 
are essentially in line with these expectations based on expression 
similarities. 
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Taken together, the results suggest the deviation scores meet 
psychometric standards. Performance improved or worsened as 
predicted by theories of categorical perception. Future research 
should examine whether expression assignment in morphed 
emotions is indicative of the ability to identify prototypical emo- 
tion expressions. 

TASK 5: EMOTION HEXAGON— DISCRIMINATION 

This task is a forced choice version of the previously described 
Task 4 and aims to measure categorical perception of emotional 
expressions using a further assessment method. 

Procedure 

Participants were asked to decide whether the morphed expres- 
sion presented in the upper middle of the screen was more similar 
to the expression prototype displayed on the lower left or lower 
right side of the screen. Stimuli were identical with those used in 
Task 4, but the sequence of presentation was different. The task 
design differed from that of Task 4 only in that participants were 
forced to decide whether the expression-mix stimulus was com- 
posed of more of the left or more of the right prototypical expres- 
sion. Response keys were the left and right control keys on the reg- 
ular computer keyboard, which were marked with colored tape. 

Results and discussion 

The average percentages of correct decisions are given in Table 2. 
This task was rather easy compared with Tasks 1-3. The distribu- 
tion of the scores was, however, not strongly skewed to the right, 
but rather followed a normal distribution with most of the par- 
ticipants performing within the range of 0.70-0.85; therefore, this 
task can be used to measure individual differences in performance 
accuracy. An rmANOVA revealed that the expressed emotion 
affected recognition accuracy, [-F(5_ 1340) = 172.94, p < 0.001, 
= 0.33]. Similarly to Task 4, the rank order of emotion rec- 
ognizability was not similar to Tasks 1 or 2. An rmANOVA with 
factor mixture ratio (levels corresponding to those from Task 4) 
showed a significant effect, [f(4, 1072) = 101.95, p < 0.001, = 
0.21]. Discrimination rates were highest at mixture ratios of 95 
and 85% and decreased with decreasing ratio of the prevailing 
emotion. Reliability estimates of the overall score were admissi- 
ble (a = 0.63; eo = 0.64) but rather poor for the emotion-specific 
scores (see Table 2), probably due to several items with skewed 
distributions and thus poor psychometric quality. Generally, the 
psychometric properties of this task need improvement and fur- 
ther studies should address the question whether forced-choice 
expression assignment in emotion-morphs is indicating the same 
ability factor indicated by the other tasks (i.e., emotion identifica- 
tion and discrimination of prototypical expressions). 

LEARNING AND RECOGNITION TASKS OF FACIAL EMOTION 
EXPRESSIONS 

The following five tasks arguably assess individual differences 
in memory-related abilities in the domain of facial expres- 
sions. All tasks consist of a learning phase for facial expressions 
and a subsequent retrieval phase that requires recognition or 
recall of previously learned expressions. The first three memory 
tasks include an intermediate task between learning and recall 



of at least three minutes, hence challenging long-term reten- 
tion. In Task 9 and 10, learning is immediately followed by 
retrieval. These tasks should measure primary and secondary 
memory (PM and SM; Unsworth and Engle, 2007) of emotion 
expressions. 

TASK 6: LEARNING AND RECOGNITION OF EMOTION EXPRESSIONS OF 
DIFFERENT INTENSITY 

With this forced-choice SM task we aimed to assess the abil- 
ity to learn and recognize facial expressions of different inten- 
sity. Emotion category, emotion intensity, and learning-set size 
varied across trials, but face identity was constant within a 
block of expressions that the participant was asked to learn 
together. Manipulations of expression intensity within tar- 
gets, but also between targets and distracters, were used to 
increase task difficulty. The recognition of expression inten- 
sity is also a challenge in everyday life; hence, the expres- 
sion intensity manipulation is not restricted to psychometric 
rationales. 

The combination of six emotional expressions with three 
intensity levels (low — the target emotion expression intensity 
was above 60%, medium — intensity above 80%, high — intensity 
above 95%) resulted in a matruc with 18 conceivable stimuli cat- 
egories for a trial block. We expected hit-rates to decline with 
increasing ambiguity for less intense targets (e.g., see the effects 
of inter-item similarity on visual-memory for synthetic faces 
reported by Yotsumoto et al., 2007) and false alarm rates to grow 
for distracters of low intensity (e.g., see effects of target-distracter 
similarity in face recognition reported by Davies et al, 1979). 

Procedure 

We administered one practice block of trials and four experimen- 
tal blocks — including four face identities (half were females) and 
18 trials per block. Each block started by presenting a set of tar- 
get faces of the same face identity but with different emotion 
expressions. To-be-learned stimuli were presented simultaneously 
in a line centered on the screen. Experimental blocks differed in 
the number of targets, expressed emotion, expression intensity, 
and presentation time. Presentation time ranged from 30 to 60 s 
depending on the number of targets within a block (two up to five 
stimuli). Facial expressions of six emotions were used as targets as 
well as distracters (happiness, surprise, anger, fear, sadness, and 
disgust). 

Participants were instructed to remember the combination of 
both expression and intensity. During a delay phase of about 
three minutes, participants worked on a two-choice RT task (they 
had to decide whether two simultaneous presented number series 
are the same or different). Recall was structured as a pseudo- 
randomized sequence of 18 single images of targets or distracters. 
Targets were identical with the previously learned expressions 
in terms of emotional content and intensity, but different pho- 
tographs of the same identities were used in order to reduce 
effects of simple image recognition. Distracters differed from 
the targets in both expression content and intensity. Participants 
were requested to provide a two-choice discrimination decision 
between learned and distracter expressions on the keyboard. After 
a response, the next stimulus was presented. 
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Results and discussion 

The average performance accuracy over all trials and across 
trials of specific emotion categories is presented in Table 3. 
Emotion significantly affected recognition performance, 
[P{5, 1340) = 120.63, p < 0.001, Ti^ = 0.24]. Pairwise compar- 
ison based on adjusted p-values for simultaneous inference 
using the Bonferroni-method showed that participants were 
better at recognizing happiness relative to all other emotions. 
Additionally, expressions of anger were significantly better 
retrieved than surprise, fear, or disgust expressions. There 
were no additional performance differences due to emotion 
content. 



Intensity manipulation was partly successful: High inten- 
sity stimuli were recognized {Mgs% = 0.86; SDgs% = 0.10) 
better than medium or low intensity stimuli (Mgoo/o = 0.70; 
SDgoo/o = 0.08; Mgoo/o = 0.74;SD6oo/„ = 0.10); [F(2, 536) = 236.24, 
p < 0.001, Ti^ = 0.35]. While performance on low intensity stim- 
uli was slightly better than performance on medium intensity 
stimuli, we believe this effect reflects a Type 1 error and will not 
replicate in an independent sample. We recommend using high 
vs. low intensity (95 and 60% expressions) as difficulty manip- 
ulation in this task in future studies. Reliability estimates are 
provided in Table 3 and suggest good psychometric properties 
for the overall task. Reliabilities for emotion-specific trials are 



Table 3 | Descriptive statistics and reliability estimates of performance accuracy for all emotion memory tasks— across all trials and for single 
target emotions (If applicable). 



Condition 


Accuracy M {SD, SE) 


Alternative score M {SD, SE) 


Alpha / Omega / # of Trials 






Overall (range: 0.49-0.89) 


0.76 (0.06, 0.00) 


1.50 (0.63, 0.04)** 


0.76/0.76/72 


Happiness (1 *) 


0.89 (0.10, 0.01) 


Overall accuracy and d'prime score: r = 0.72 


0.50/0.53/10 


Surprise (4) 


0.73 (0.10, 0.01) 




0.53/0.54/14 


Fear (4) 


0.73 (0.13, 0.01) 




0.51/0.52/13 


Sadness (3) 


0.74 (0.11, 0.01) 




0.49/0.50/11 


Disgust (4) 


0.73 (0.11, 0.01) 




0.62/0.64/14 


Anger (2) 


0.76 (0.10, 0.01) 




0.50/0.51/10 


TASK 7: LEARNING AND RECOGNITION OF EMOTIONAL EXPRESSIONS FROM DIFFERENT VIEWPOINTS ^^^^^^H^^^^^^H 


Overall (range: 0.46-0.91) 


0.75 (0.08, 0.01) 


1.46 (0.63, 0.04)** 


0.75/0.75/56 


Happiness (2) 


0.79 (0.16, 0.01) 


Overall accuracy and d'prime score: r = 0.94 


0.34/0.38/08 


Surprise (1 *) 


0.81 (0.15, 0.01) 




0.45/0.46/08 


Fear (5) 


0.72 (0.14, 0.01) 




0.27/0.28/10 


Sadness (3) 


0.77 (0.14, 0.01) 




0.38/0.39/09 


Disgust (4) 


0.73 (0.14, 0.01) 




0.44/0.45/12 


Anger (6) 


0.68 (0.14, 0.01) 




0.26/0.30/09 


TASK 8: LEARNING AND RECOGNITION OF MIXED EMOTION EXPRESSIONS IN EXPRESSION MORPHS ^^^^^H 


Overall (range: 0.43-0.89) 


0.69 (0.08, 0.00) 


1.11 (0.55, 0.03)** 

Overall accuracy and d'prime score: r = 0.92 


0.68/0.68/56 




Overall (range: 0.09-0.75) 


0.45 (0.13, 0.01) 




0.59/0.59/32 


Happiness (1 *) 


0.54 (0.25, 0.02) 




0.34/0.42/05 


Surprise (2) 


0.42 (0.23, 0.01) 




0.33/0.34/06 


Fear (4) 


0.38 (0.22, 0.01) 




0.16/0.23/05 


Sadness (3) 


0.39 (0.24, 0.01) 




0.22/0.35/05 


Disgust (1) 


0.55 (0.22, 0.01) 




0.29/0.31/06 


Anger (3) 


0.39 (0.22, 0.01) 




0.16/0.18/05 




Overall (range: 0.07-0.78) 


0.42 (0.13, 0.01) 




0.58/0.60/27 


Happiness (1 *) 


0.52 (0.26, 0.02) 




0.20/0.37/04 


Surprise (2) 


0.46 (0.21, 0.01) 




0.27/0.28/05 


Fear (4) 


0.42 (0.20, 0.01) 




0.14/0.18/04 


Sadness (6) 


0.23 (0.21, 0.01) 




0.20/0.22/03 


Disgust (5) 


0.38 (0.24, 0.01) 




0.26/0.27/05 


Anger (3) 


0.44 (0.19, 0.01) 




0.27/0.30/06 



M, means, SE, standard errors, SD, standard deviations; Alpha (a) and Omega (to) coefficients; # of Trials, the number of trials used for calculating the reliability 
coefficients; * the rank order of recognizability across emotions is indicated in the brackets: ", d'prime score; the chance probability in case of Task 6, 7, and 8 is 
0.50 and cannot be computed for tasks 9 and 10. 
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acceptable considering the low number of indicators and the het- 
erogeneity of facial emotion expressions in general. In sum, we 
recommend using two levels of difficulty and the overall perfor- 
mance as indicators of expression recognition accuracy for this 
task. 

TASK 7: LEARNING AND RECOGNITION OF EMOTIONAL EXPRESSIONS 
FROM DIFFERENT VIEWPOINTS 

In this delayed recognition memory task, we displayed facial 
expressions with a frontal view as well as right and left three- 
quarter views. We aimed to assess long-term memory bindings 
between emotion expressions and face orientation. Thus, in order 
to achieve a correct response, participants needed to store both 
the emotion expressions and the viewpoints. This task is based on 
the premise that remembering content-context bindings is crucial 
in everyday socio-emotional interactions. An obvious hypothe- 
sis regarding this task is that emotion expressions are recognized 
more accurately from the frontal view than from the side, because 
more facial muscles are visible from the frontal view. On the other 
side, Matsumoto and Hwang (2011) reported that the presenta- 
tion of emotional expressions in hemi-face profiles did not lower 
accuracy rates of recognition relative to frontal views. It is impor- 
tant to note that manipulation of the viewpoint is confounded 
with manipulating gaze direction in the present task. Adams and 
Kleck (2003) discuss effects of gaze direction on the processing 
of emotion expressions. A comparison of accuracy rates between 
frontal and three-quarter views is therefore interesting. 

Procedure 

This task includes one practice block and four experimental 
blocks with each including only one face identity and consisting 
of 12-16 recall-trials. During the initial learning phase, emotion 
expressions from different viewpoints were simultaneously pre- 
sented. The memory set size varied across blocks from four to 
seven target stimuli. Targets differed according to the six basic 
emotions and the three facial perspectives (frontal, left, and right 
profile views). Presentation time changed depending on the num- 
ber of stimuli presented during the learning phase and ranged 
between 30 and 55 s. Participants were explicitly instructed to 
memorize the association between expressed emotion and per- 
spective. This was followed by a verbal two-choice RT distracter 
task for about one minute where participants decided whether a 
presented word contained the letter "A." During the recall phase, 
novel images, which showed the emotion expression from the 
same perspective as in the learning phase, served as the tar- 
gets. These images were shown in a pseudo-randomized sequence 
intermixed with distracters, which differed from the targets in 
expression, perspective, or both. Participants were asked to decide 
whether or not a given image had been shown during the learning 
phase by pressing one of two buttons on the keyboard. After the 
participants chose their response, the next trial started. 

Results and discussion 

Table 3 displays the performance accuracy for this task. The aver- 
age scores suggest adequate levels of task difficulty — well above 
guessing probability and below ceiling. Reliability estimates for 
the overall task reveal good psychometric quality (a = 0.75; m = 



0.75). Reliability estimates for the emotion specific trials were 
considerably lower; these estimates might be raised by increas- 
ing the number of stimuli per emotion category. An rmANOVA 
suggested rather small but significant performance differences 
between emotion categories, [-F(5, 1340) = 38.06, p < 0.001, yy^ = 
0.09]. Pairwise comparisons showed that expressions of happiness 
and surprise were recognized the best and anger and fear were rec- 
ognized the worst. Viewpoint effects were as expected and contra- 
dict the results from Matsumoto and Hwang (2011). Expressions 
were recognized significantly better if the expression was learned 
with frontal rather than a three quarter view: 268) = 13.60, 
p < 0.001], however the mean difference between these two per- 
spectives was low (Mdiff = 0.03; n]^ = 0.02). We recommend 
using face orientation as a difficulty manipulation and the overall 
performance across trials as indicator of expression recognition. 

TASK 8: LEARNING AND RECOGNITION OF MIXED EMOTION 
EXPRESSIONS IN EXPRESSION MORPHS 

With this task, we intended to assess recognition performance of 
mixed — rather than prototypical — facial expressions. It was not 
our aim to test theories that postulate combinations of emotions 
that result in complex affect expressions, such as contempt, which 
is proposed as a mixture of anger and disgust or disappointment, 
which is proposed as a combination of surprise and sadness (cf 
(Plutchik, 2001)). Instead, we aimed to use compound emotion 
expressions to assess the ability to recognize less prototypical, and 
to some extent more real-life, expressions. Furthermore, these 
expressions are not as easy to label as the basic emotion expres- 
sions parenting the mixed expressions. Therefore, for the mixed 
emotions of the present task we expect a smaller contribution of 
verbal encoding to task performance, as has been reported for 
face recognition memory for basic emotions (Nakabayashi and 
Burton, 2008). 

Procedure 

We used nine different combinations of six basic emotions 
(Plutchik, 2001): Fear/disgust, fear/sadness, fear/happiness, 
fear/surprise, anger/disgust, anger/happiness, sadness/surprise, 
sadness/disgust, and happiness/surprise. Within each block of 
trials, the images used for morphing muced expressions were 
from a single identity. Across blocks, sex of the identities was 
balanced. There were four experimental blocks preceded by a 
practice block. The number of stimuli to be learned ranged from 
two targets in Block 1 to five targets in Block 4. The presentation 
time of the targets during the learning period changed depending 
on the number of targets displayed, ranging from 30 to 60 s. 
Across blocks, 11 targets showed morphed mixture ratios of 
50:50 and five targets showed one dominant expression. During 
the learning phase, stimuli were presented simultaneously on the 
screen. During a delay period of approximately three minutes, 
participants answered a subset of questions from the Trait Meta 
Mood Scale (Salovey et al., 1995). At retrieval, participants saw 
a pseudo-randomized sequence of images displaying mixed 
expressions. Half of the trials were learned images. The other 
trials differed from the learned targets in the expression mixture, 
in the mixture ratio, or both. Participants were asked to decide 
whether or not a given image had been shown during the learning 
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phase by pressing one of two buttons on the keyboard. There 
were 56 recall trials in this task. The same scoring procedures 
were used as in Task 7. 

Results and discussion 

The average performance over all trials (see Table 3) was well 
above chance. Different scoring procedures hardly affected the 
rank order of individuals within the sample; the proportion cor- 
rect scores were highly correlated with the d'prime scores (0.92; 
p < 0.001). Reliability estimates suggest good psychometric qual- 
ity. Further studies are needed to investigate whether learning and 
recognizing emotion-morphs are tapping the same ability factor 
as learning and recognizing prototypical expressions of emotion. 
Because expectations on mean differences at recognizing expres- 
sion morphs are difficult to derive from a theoretical point of 
view, we only consider the psychometric quality of the overall 
score for this task. 

TASK 9: CUED EMOTIONAL EXPRESSIONS SPAN 

Memory span paradigms are frequently used measures of primary 
memory. The present task was designed as a serial cued memory 
task for emotion expressions of different intensity. Because recog- 
nition was required in the serial order of the stimuli displayed 
at learning, the sequence of presentation served as a temporal- 
context for memorizing facial expressions. We used FaceReader 
(see above) to score intensity levels of the stimuli chosen for this 
task. 

Procedure 

Based on the FaceReader scores, we categorized the facial stim- 
uli into low intensity (60%), medium intensity (70%), and high 
intensity (80%) groups. We used three male and four female iden- 
tities throughout the task, with one identity per block. The task 
began with a practice block followed by seven experimental blocks 
of trials. Each block started with a sequence of facial expressions 
(happiness, surprise, fear, sadness, disgust, and anger), presented 
one at a time, and was followed immediately by the retrieval 
phase. The sequence of targets at retrieval was the same as the 
memorized sequence. Participants were suggested to use the serial 
position as memory cue. Number of trials vnthin a sequence 
varied between three and six. Most of the targets (25 of 33 
images) and distracters (37 of 54 images) displayed high intensity 
prototypical expressions. 

During the learning phase stimulus presentation was fixed to 
500 ms, followed by a blank inter-stimulus interval of another 
500 ms. At retrieval, the learned target-expression was shown 
simultaneously with three distractors in a 2 x 2 matrix. The posi- 
tion of the target in this matrix varied across trials. Distracters 
within a trial differed from the target in its emotional expression, 
intensity, or both. Participants indicated the learned expression 
via mouse click on the target image. 

Results and discussion 

Table 3 provides performance and reliability estimates. Average 
performance ranged between 0.38 and 0.55, was clearly above 
chance level, and with no ceiling effect. Reliability estimates 
for the entire task are acceptable; reliability estimates for the 



emotion-specific trials were low; increasing the number of tri- 
als could improve reliabilities for the emotion-specific trials. 
An rmANOVA suggested significant but small performance dif- 
ferences across the emotion categories, [f(5^ 1340) = 36.98, p < 
0.001, = 0.09]; pairwise comparisons revealed participants 
were better at retrieving happiness and disgust expressions com- 
pared with all other expressions; no other differences were statisti- 
cally significant. We therefore recommend the overall percentage 
correct score as a psychometrically suitable measure of individual 
differences of primary memory for facial expressions. 

TASK 10: THE GAME MEMORY WITH EMOTION EXPRESSIONS 

This task is akin to the well-known game "Memory." Several pairs 
of emotion expressions, congruent in emotional expression and 
intensity, were presented simultaneously for a short time. The 
task was to quickly detect the particular pairs and to memo- 
rize them in conjunction with their spatial arrangement on the 
screen. Successful detection of the pairs requires perceptual abili- 
ties. During retrieval, one expression was automatically disclosed 
and participants had to indicate the location of the correspond- 
ing expression. Future work might decompose perceptual and 
mnestic demands of this task in a regression analysis. 

Procedure 

At the beginning of a trial block several expressions initially cov- 
ered with a card deck appeared as a matrix on the screen. During 
the learning phase, all expressions were automatically disclosed 
and participants were asked to detect expression pairs and to 
memorize their location. Then, after several seconds, the learn- 
ing phase was stopped by the program, and again the cards 
were displayed on the screen. Next, one image was automatically 
disclosed and participants indicated the location of the corre- 
sponding expression vnth a mouse click. After the participant's 
response, the cUcked image was revealed and feedback was given 
by encircling the image in green (correct) or red (incorrect). Two 
seconds after the participant responded, the two images were 
again masked with the cards, and the next trial started by pro- 
gram flipping over another card to reveal a new image. Figure 4 
provides a schematic representation of the trial sequence within 
an experimental block. 

Following the practice block, there were four experimen- 
tal blocks of trials. Expression matrices included three (one 
block), six (one block), and nine (two blocks) pairs of expres- 
sions that were distributed pseudo-randomized across the lines 
and columns. Presentation time for learning depended on the 
memory set size: 20, 40, and 60 s for three, six, and nine expres- 
sion pairs, respectively. Within each block each image pair was 
used only once, resulting in 27 responses, representing the total 
number of trials for this task. 

Results and discussion 

The average proportion of correctly identified emotion pairs and 
reliability estimates, are summarized in Table 3. Similar to Task 9, 
guessing probability is much lower than 0.50 in this task; there- 
fore the overall accuracy of 0.40 is acceptable. Reliability is also 
good. Due to the low number of trials within one emotion cat- 
egory, these reliabilities are rather poor, but could be increased 
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FIGURE 4 I Schematic representation of a trail block from Task 10 (Memory for facial expression of emotion). 



by including additional trials. There was a small but significant 
effect of emotion category on performance accuracy, [F(s, 1340) = 
68.74,p < 0.001, Ti^ = 0.15], as indicated by an rmANOVA. Pairs 
of happiness, surprise, anger, and fear expressions were remem- 
bered the best and sadness was remembered the worst. In the 
current version, we recommend the overall score as a psychome- 
tricaUy suitable performance indicator of memory for emotional 
expressions. 

SPEED TASKS OF PERCEIVING AND IDENTIFYING FACIAL 
EMOTION EXPRESSIONS 

We also developed speed indicators of emotion perception and 
emotion recognition ability following the same rationale as 
described by Herzmann et al. (2008) and Wilhelm et al. (2010). 
Tasks that are so easy that the measured accuracy levels are at 
ceiling allow us to gather individual differences in performance 
speed. Therefore, for the following tasks we used stimuli with high 
intensity prototypical expressions for which we expected recogni- 
tion accuracy rates to be at or close to ceiling (above 0.80). Like 
the accuracy tasks described above, the speed tasks were intended 
to measure either emotion perception (three tasks) or emotion 
recognition (three tasks). Below we describe the six speed tasks 
and report results on their psychometric properties. 

TASK 11: EMOTION PERCEPTION FROM DIFFERENT VIEWPOINTS 

Recognizing expressions from different viewpoints is a cru- 
cial socio-emotional competence relevant for everyday interac- 
tion. Here, we aimed to assess the speed of perceiving emotion 



expressions from different viewpoints by using a discrimination 
task with same-different choices. 

Procedure 

Two same-sex images with different facial identities were pre- 
sented next to each other. One face was shown with a frontal view 
and the other in three-quarter view. Both displayed one of the 
six prototypical emotion expressions. Participants were asked to 
decide as fast and accurately as possible whether the two persons 
showed the same or different emotion expressions by pressing one 
of two marked keys on the keyboard. 

There was no time limit on the presentation. Participants' 
response started the next trial, after the presentation of a 1.3 s 
blank interval. Trials were pseudo-randomized in sequence and 
were balanced for expression match vs. mismatch, side of pre- 
sentation, position of the frontal and three-quarter view picture, 
as well as for the face identity sex. To ensure high accuracy 
rates, confusable expressions according to the hexagon model 
(Sprengelmeyer et al, 1996) were never presented together in 
mismatch trials. There was a practice block of six trials with feed- 
back. Experimental trials started when the participant achieved a 
70% success rate in the practice trials. There were 3 1 experimen- 
tal trials. Each of the sbc basic emotions occurred in match and 
mismatch trials. 

Results and discussion 

Average accuracies and RTs, along with average inverted latency 
(see general description of scoring procedures above) are 



www.frontiersln.org 



May 2014 | Volume 5 | Article 404 | 15 



Wilhelm et al. 



Measuring facial expression perception and recognition 



presented in Table 4. As required for speed tasks, accuracy rates 
were at ceUing. RTs and inverted latencies showed that partici- 
pants needed about two seconds on average to correctly match 
the two facial expressions presented in the frontal vs. three 



quarter view. An rmANOVA of inverted RTs revealed differ- 
ences in the expression matching speed across emotion cate- 
gories, [f(5, 1340) = 263.84, p < 0.001, t]^ = 0.22]. Bonferroni- 
adjusted pairwise comparisons indicate the strongest difference 



Table 4 | Descriptive statistics and reliability estimates of performance speed for all speed measures of emotion perception— across all trials 
and for single target emotions. 



Condition Accuracy M [SD, SE) Reaction time; 1000/reaction Alpha / Omega / # of Trials 

time; M (SD, SE) 





Overall 


0.90 (0.06, 0.00) 


2181 (686,41) 
0.59 (0.15, 0.01) 


0.95/0.96/31 


Happirness 


0.98 (0.09, 0.01) 


1325 (549, 34) 
0.86 (0.23, 0.01) 


0.66/0.66/02 


Surprise 


0.94 (0.17, 0.01) 


1966 (818, 53) 
0.62 (0.20, 0.01) 


0.51/0.51/02 


Fear 


0.95 (0.11, 0.01) 


2018 (733, 49) 
0.62 (0.17 0.01) 


0.46/0.47/04 


Sadrness 


0.90 (0.11, 0.01) 


2426 (841, 21) 
0.58 (0.18, 0.01) 


0.88/0.88/09 


Disgust 


0.85 (0.13, 0.01) 


2115 (673, 76) 
0.58 (0.15, 0.01) 


0.78/0.79/07 


Anger 


0.96 (0.08, 0.01) 


2033 (638, 45) 
0.62 (0.16, 0.01) 


0.82/0.83/07 


TASK EMOTIONAL ^^^^^^^^^^^^ .^^^^^^^^^^^^^^^T^^^^^^M 


Overall 


0.96 (0.05, 0.00) 


2266 (761, 46) 
0.56 (0.15, 0.01) 


0.96/0.96/30 


Happiness 


0.98 (0.06, 0.00) 


1 844 (654, 42) 
0.65 (0.18, 0.01) 


0.72/0.73/05 


Surprise 


0.95 (0.13, 0.01) 


2224 (936, 71) 
0.54 (0.16, 0.01) 


0.73/0.73/05 


Fear 


0.94 (0.12, 0.01) 


2672 (983, 74) 
0.49 (0.16, 0.01) 


0.67/0.67/05 


Sadness 


0.95 (0.10, 0.01) 


2453 (916, 62) 
0.53 (0.18, 0.01) 


0.75/0.75/05 


Disgust 


0.98 (0.06, 0.00) 


2252 (901 , 50) 
0.57 (0.18, 0.01) 


0.70/0.71/05 


Anger 


0.97 (0.08, 0.00) 


2301 (940, 61) 
0.57 (0.18, 0.01) 


0.72/0.73/05 


TASK 13: IDENTIFICATION SPEED OF EMOTIONAL EXPRESSIONS 






Overall 


0.93 (0.05, 0.00) 


3043 (774, 47) 
0.42 (0.09, 0.01) 


0.96/0.96/48 


Happiness 


0.98 (0.05, 0.00) 


1893 (548, 35) 
0.63 (0.16, 0.01) 


0.78/0.79/08 


Surprise 


0.99 (0.04, 0.00) 


2921 (832, 53) 
0.43 (0.11, 0.01) 


0.77/0.78/08 


Fear 


0.85 (0.16, 0.01) 


4078 (955, 96) 
0.30 (0.08, 0.00) 


0.77/0.77/08 


Sadness 


0.90 (0.11, 0.01) 


3462 (952, 90) 
0.35 (0.10, 0.01) 


0.78/0.78/08 


Disgust 


0.95 (0.08, 0.00) 


2794 (791, 60) 
0.42 (0.10, 0.01) 


0.78/0.78/08 


Anger 


0.90 (0.12, 0.01) 


3281 (948, 81) 
0.36 (0.10, 0.01) 


0.81/0.81/08 



M, means, SE, standard errors, SD, standard deviations; Alpha (a) and Omega (u>) coefficients calculated for speed measures f 1000/reaction time); # of Trials, the 
number of trials used for calculating reliability coefficients. 
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in performance between matching emotion expressions occurred 
between happiness compared to all other emotions. Other sta- 
tistically significant, but small effects indicated that performance 
matching surprise, fear, and anger expressions was faster than 
performance matching sadness and disgust. Reliability estimates 
are excellent for the overall score and acceptable for the emotion 
specific trials. However, happiness, surprise, and fear expressions 
were less frequently used in this task. Reliabilities of emotion- 
specific scores could be increased by using more trials in future 
applications. 

TASK 12: EMOTIONAL ODD-MAN-OUT 

This task is a revision of the classic Odd-Man-Out task (Frearson 
and Eysenck, 1986), where several items are shown simultane- 
ously of which one — the odd-man-out — differs from the others. 
Participants' task is to indicate the location of the odd-man-out. 
The emotion-expression version of the task — as implemented 
by Herzmann et al. (2008) and in the present study — requires 
distinguishing between different facial expressions of emotion 
presented within a trial in order to detect the "odd" emotional 
expression. 

Procedure 

Three faces of different identities (but of the same sex), each dis- 
playing an emotion expression, were presented simultaneously in 
a row on the screen. The face in the center displayed the refer- 
ence emotion from which either the left or right face differed 
in expression, whereas the remaining third face displayed the 
same emotion. Participants had to locate the divergent stimu- 
lus (odd-man-out) by pressing a key on the corresponding side. 
The next trial started after a 1.3-s blank interval. Again, we 
avoided combining highly confusable expressions of emotions in 
the same trial to ensure high accuracy rates (Sprengelmeyer et al., 
1996). Five practice trials with feedback and 30 experimental trials 
were administered in pseudo-randomized order. Each emotion 
occurred as both a target and as a distracter. 

Results and discussion 

Table 4 displays relevant results for this task. Throughout, accu- 
racy rates were very high for all performance indicators, demon- 
strating the task to be a measure of performance speed. On 
average, participants needed about 2 s to detect the odd-man-out. 
There were statistically significant, but small performance differ- 
ences depending on the emotion category, [_F(5, 1340) = 109.43, 
p < 0.001,77^ = 0.08]. Differences mainly occurred between hap- 
piness and all other expressions. In spite of the small number of 
trials per emotion category (5), reliability estimates of the over- 
all score based on inverted latencies are excellent and good for 
all emotion specific scores. We conclude that the overall task and 
emotion specific trial scores have good psychometric quality. 

TASK 13: IDENTIFICATION SPEED OF EMOTIONAL EXPRESSIONS 

The purpose of this task is to measure the speed of the visual 
search process (see Task 3) involved in identifying an expression 
belonging to an indicated expression category. Here, an emotion 
label, a targeted emotional expression, and three mismatching 
alternative expressions, were presented simultaneously on the 
screen. The number of distracters was low in order to minimize 



task difficulty. Successful performance on this task requires a 
correct link of the emotion label and the facially expressed emo- 
tion and an accurate categorization of the expression to the 
appropriate semantic category. 

Procedure 

The name of one of the six basic emotions was printed in the cen- 
ter of the screen. The emotion label was surrounded in horizontal 
and vertical directions by four different face identities of the same 
sex all displaying different emotional expressions. Participants 
were asked to respond with their choice by using the arrow-keys 
on the number block of a regular keyboard. There were two prac- 
tice trials at the beginning. Then, each of the six emotions was 
used eight times as a target in a pseudorandom sequence of 48 
experimental trials. There were no time limits for the response, 
but participants were instructed to be as fast and accurate as 
possible. The ISI was 1300 ms. 

Results and discussion 

Average performance, as reflected by the three relevant scores for 
speed indicators, are depicted in Table 4. Accuracy rates were at 
ceiling. An rmANOVA of inverted latencies showed strong differ- 
ences in performance speed for the different emotional expres- 
sions, [F(5. 1340) = 839.02, p < 0.001, ti^ = 0.48]. Expressions of 
happiness and surprise were detected the fastest, followed by dis- 
gust and anger, and finally sadness and fear. Reliability estimates 
were excellent for the overall score and good for emotion specific 
performance scores. All results substantiate that the scores derived 
from Task 12 reflect the intended difficulty for speed tasks and 
have good psychometric properties. 

SPEED TASKS OF LEARNING AND RECOGNIZING FACIAL 
EMOTION EXPRESSIONS 

TASK 14: 7-BACK RECOGNITION SPEED OF EMOTIONAL EXPRESSIONS 

In the M-back paradigm, a series of different pictures is presented; 
the task is to judge whether a given picture has been presented n 
pictures before. It has been traditionally used to measure working 
memory (e.g., Cohen et al, 1997). The i-back condition requires 
only minimal effort on storage and processing in working mem- 
ory. Therefore, with the i-back task using emotion expressions we 
aimed to assess recognition speed of emotional expressions from 
working memory and expected accuracy levels to be at ceiling. 

Procedure 

We administered a i-back task with one practice block and 
four experimental blocks of trials. Each experimental block con- 
sisted of a sequence of 24 different images originating from 
the same identity displaying all six facial emotional expres- 
sions. Participants were instructed to judge whether the emo- 
tional expression of each image was the same as the expression 
presented in the previous trial. The two-choice response was 
given with a left or right key (for mismatches and matches, 
respectively) on a standard keyboard. The next trial started 
after the participant provided their response, with a fixation 
cross presented on a blank screen for 200 ms in between tri- 
als. Response time was not limited by the experiment. Practice 
trials with feedback needed to be completed with at least 80% 
accuracy in order to continue with the experimental blocks. 
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All basic emotion expressions were presented as targets in at 
least one experimental block. Target and distracters were pre- 
sented at a ratio of 1:4 and there were 24 target trials in 
total. 



Results and discussion 

Table 5 summarizes the average accuracies, RTs, and inverted 
latencies. As expected, accuracies were at ceiling. Participants 
were on average able to correctly respond to more than one 



Table 5 | Mean accuracy, reaction times (in ms) and reliability estimates of performance speed for all speed measures of emotion 
memory— across all trials and for single target emotions (if applicable). 



Condition Accuracy M (SD, SE) Reaction time; 1000/reaction Alpha / Omega /# of Trials 

time; M {SD, SE] 





Overall 


0.94 (0.05, 0.00) 


880 (193, 11) 
1.17 (0.25, 0.02) 


0.91/0.91/24 




Happirness 


0.88 (0.18, 0.01) 


923 (337 20) 
1.26 (0.26, 0.02) 


0.65/0.65/04 




Surprise 


0.85 (0.19, 0.01) 


867 (515, 31) 
1.34 (0.27 0.02) 


0.64/0.64/04 




Fear 


0.88 (0.18, 0.01) 


941 (269, 16) 
1.21 (0.23, 0.01) 


0.58/0.58/04 




Sadrness 


0.92 (0.15, 0.01) 


807 (186, 11) 
1.36 (0.25, 0.02) 


0.62/0.62/04 




Disgust 


0.93 (0.16, 0.01) 


833 (228, 14) 
1.34 (0.25, 0.02) 


0.55/0.58/04 




Anger 


0.93 (0.15, 0.01) 


876 (285, 17) 
1.31 (0.23, 0.01) 


0.64/0.65/04 




TASK 15: DELAYED NON-MATCHING TO SAMPLE WITH EMOTIONAL EXPRESSIONS ^^^^^^M 






Overall 


0.90 (0.08, 0.00) 


1555 (412, 25) 
0.80 (0.20, 0.01) 


0.95/0.96/36 




Happiness 


0.95 (0.11, 0.01) 


1159 (392, 23) 
1.00 (0.27 0.02) 


0.87/0.87/06 




Surprise 


0.97 (0.09, 0.01) 


1385 (436, 26) 
0.85 (0.24, 0.01) 


0.84/0.84/06 




Fear 


0.93 (0.13, 0.01) 


1474 (491, 29) 
0.82 (0.24, 0.01) 


0.81/0.82/06 




Sadness 


0.75 (0.23, 0.01) 


2232 (907 55) 
0.59 (0.21, 0.01) 


0.78/0.79/06 




Disgust 


0.85 (0.17 0.01) 


1930 (671, 40) 
0.64(0.20, 0.01) 


0.77/0.77/06 




Anger 


0.91 (0.14, 0.01) 


1589 (492, 30) 
0.73 (0.21, 0.01) 


0.76/0.76/06 




TASK 16: RECOGNITION SPEED OF MORPHED EMOTIONAL EXPRESSIONS ^^^^^^M 






Overall 


0.89 (0.09, 0.01) 


1068 (228, 13) 
1.25 (0.20, 0.01) 


0.92/0.92/36 




Happiness — surprise 


0.89 (0.17 0.01) 


870 (269, 16) 
1.42 (0.25, 0.02) 


0.63/0.64/07 




Happiness — anger 


0.89 (0.19, 0.01) 


948 (307 18) 
1.27 (0.25, 0.02) 


0.64/0.65/05 




Fear — surprise 


0.90 (0.12, 0.01) 


958 (290, 17) 
1.33 (0.25, 0.02) 


0.71/0.72/07 




Fear — sadness 


0.89 (0.17 0.01) 


932 (270, 16) 
1.31 (0.27 0.02) 


0.67/0.68/05 




Sadness — disgust 


0.84 (0.21, 0.01) 


1488 (685, 41) 
0.95 (0.25, 0.01) 


0.60/0.62/05 




Disgust — anger 


0.93 (0.11, 0.01) 


1264 (380, 23) 
1.12 (0.22, 0.01) 


0.64/0.64/07 





M, means, SE, standard errors, SD, standard deviations; Alpha (a) and Omega (u>) coefficients calculated for speed measures f 1000/reaction time); # of Trials, the 
number of trials used for calculating the reliability coefficients. 
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trial per second. There were very small (but statistically signifi- 
cant) differences between emotion categories, as suggested by an 
rniANOVA, [F(5, 1340) = 29.99, p < 0.001, t)^ = 0.04]. Reliability 
estimates were excellent for the overall task and acceptable for 
emotion specific latency scores given the low number of trials for 
an emotion category. These results suggest that Task 14 is a psy- 
chometricaUy sound measure of emotion recognition speed from 
faces. 

TASK 15: DELAYED NON-MATCHING TO SAMPLE WITH EMOTIONAL 
EXPRESSIONS 

The present task was inspired by the Delayed Non-Matching 
paradigm implemented for face identity recognition by 
Herzmann et al. (2008) and was modified here in order to 
assess emotion expression recognition. This task requires the 
participant to store and maintain a memory of each emotion 
expression; the images are presented during the learning phase 
for a short period of time and during the experimental trials the 
images have to be recollected from the visual primary memory 
and compared with a novel facial expression. Because the task 
requires a short maintenance time for a single item in the absence 
of interfering stimuli, we expect the task to show accuracy rates 
at ceiling and to measure short-term recognition speed. 

Procedure 

A facial expression of happiness, surprise, fear, sadness, disgust, or 
anger was presented for 1 second. Following a delay of 4 s (500 ms 
mask; 3500 ms blank screen) the same emotion expression was 
presented together with a different facial expression. Depending 
on where the new (distracter) expression was presented, partic- 
ipants had to press a left or right response-key on a standard 
keyboard in order to indicate the distractor facial expression. In 
each trial we used three different identities of the same sex. During 
the 36 experimental trials, expressions belonging to each emotion 
category had to be encoded six times. There were three practice 
trials. 

Results and discussion 

Results are summarized in Table 5. Average accuracy across par- 
ticipants suggests ceiling effects for recognizing emotion expres- 
sions, with the exception of sadness where recognition rates were 
rather low. Speed of sadness recognition should be carefully inter- 
preted because it relies on just a few latencies associated with 
correct responses for many of the participants. Overall, partic- 
ipants correctly recognized less than one item per second and 
exactly one item per second in the case of happy faces. There were 
medium-sized differences in performance speed across emotion 
categories, [_F(5 1340) = 296.36, p < 0.001, = 0.26], and pair- 
wise comparisons suggested statistically significant differences in 
the recognition speed for all emotions except for fear and surprise. 
The rank order of recognition speed of the six emotions followed 
a pattern comparable to the pattern identified for the emo- 
tion perception accuracy tasks reported earlier. Divergent stimuli 
compared to happiness, surprise (and fear) as target expressions 
were identified the quickest, followed by anger, disgust, and finally 
sadness. Reliability estimates are again excellent for the overall 
score and very good for emotion specific trials, suggesting good 
psychometric quality. 



TASK 16: RECOGNITION SPEED OF MORPHED EMOTIONAL 
EXPRESSIONS 

This task is an implementation of a frequently used paradigm 
for measuring recognition memory (e.g., Warrington, 1984) and 
has also been used by Herzmann et al. (2008) for face identity 
recognition. The present task is derived from the face identity task 
and applied for emotion processing. We used morphed emotion 
expressions resulting in combinations of happiness, surprise, fear, 
sadness, disgust, or anger that were designed to appear as natural- 
istic as possible. Because the stimuli do not display prototypical 
expressions, the emotional expressions are difficult to memorize 
purely on the basis of semantic encoding strategies. The goal of 
this task was to measure visual encoding and recognition of facial 
expressions. In order to keep the memory demand low and to 
design the task to be a proper measure of speed, single expressions 
were presented for a relatively long period during the learning 
phase. 

Procedure 

This task consisted of one practice block and six experimental 
blocks. We kept stimulus identity constant within blocks. A block 
started with a 4-s learning phase, followed by a short delay dur- 
ing which participants were asked to answer two questions from 
a scale measuring extraversion, and finally the recognition phase. 
The extraversion items were included as an intermediate task in 
order to introduce a memory consolidation phase. There was 
one morphed expression to memorize per block during the 4-s 
learning time. The morphs were generated as a blend of two 
equally weighted, easily confusable emotional expressions accord- 
ing to their proximity on the emotion hexagon (Sprengelmeyer 
et al., 1996). During retrieval, the identically morphed expression 
was presented three times within a pseudo-randomized sequence 
with three different distracters. This resulted in six (targets and 
distracters) x six (blocks of trials) = 36 trials in total. All stim- 
uli were presented in isolation during the recognition phase, 
each requiring a response. Participants indicated via a key press 
whether or not the presented stimulus was included in the learn- 
ing phase at the beginning of the block. There were no restrictions 
on response time. 

Results and discussion 

Average performance, in terms of accuracy and the two differ- 
ent speed scores, are presented in Table 5. As expected, accu- 
racy rates were at ceiling and participants were able to respond 
correctly to somewhat more than one trial per second on aver- 
age. Specific morphs of different mixtures of emotion categories 
modulated performance differences in recognition speed; effects 
were of medium size, [_F(5, 1340) = 306.50,1) < 0.001, = 0.28]. 
Reliabilities were excellent for the overall score of inverted laten- 
cies and in the acceptable range for emotion-specific trials. The 
indicators derived from this task are therefore suitable measures 
of the speed of emotion recognition from faces. 

GENERAL DISCUSSION 

We begin this discussion by providing a summary and evaluation 
of key findings and continue with methodological considerations 
regarding the overarching goal of this paper. We conclude with 
delineating some prospective research questions. 
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SUMMARY OF KEY FINDINGS 

We designed and assessed 16 tasks developed to measure indi- 
vidual differences in the ability to perceive or recognize facial 
emotion expressions. Each task explicitly measures these abili- 
ties by provoking maximum effort in participants and each item 
in each task has a veridical response. Performance is assessed by 
focusing on either the accuracy or speed of response. Competing 
approaches to scoring the measures were considered and com- 
pared for several tasks. The final set of suggested scoring proce- 
dures are psychometricaUy sound, simple, adequately distributed 
for future multivariate analysis, and exhaust the information col- 
lected. Therefore, all tasks can be considered to be measures 
of abilities. For each of the tasks we presented emotion-specific 
(where applicable) and overall scores concerning mean perfor- 
mance, individual differences in performance, and precision. 
Additionally, coefficients of internal consistency and factor satu- 
ration were presented for each task — including emotion-specific 
results when possible. 

Taken together, the 16 tasks worked well: They were neither 
too easy nor too hard for the participants, and internal con- 
sistency and factor saturation were satisfactory. With respect to 
mean performance across all emotion domains and tasks there 
was an advantage for happy faces in comparison to aU other 
facial expressions. This finding parallels several previous reports 
of within- and across-subject studies on facial expression recog- 
nition (e.g., Russell, 1994; Elfenbein and Ambady, 2002b,c; Jack 
et al., 2009; Recio et al., 2013). With respect to results concern- 
ing the covariance structure it might be argued that some of 
the emotion-specific results are not promising enough because 
some of the psychometric results are still in the lower range of 
desirable magnitudes. However, the tasks presented here ought 
not to be considered as stand-alone measures. Instead, preferably 
a compilation of these tasks should be jointly used to mea- 
sure important facets of emotion-related interpersonal abilities. 
Methodologically, the tasks presented here would thus serve like 
the items of a conventional test as indicators below presupposed 
latent factors. Additionally, some of the unsatisfactory psychome- 
tric coefficients are likely to improve if test length is increased. 
Depending on available resources in a given study or applica- 
tion context and in fine with the measurement intention tasks for 
one or more ability domains can be sampled from the present 
collection. We recommend sampling three or more tasks per abil- 
ity domain. The duration estimates provided in Table 1 facilitate 
compilation of such task batteries in line with pragmatic needs of 
a given study or application context. 

ASSESSMENT OF THE OVERARCHING GOAL TO DEVELOP A BAHERY 
OF INDICATORS FOR PERCEPTION AND RECOGNITION OF FACIALLY 
EXPRESSED EMOTIONS 

In this paper, we presented a variety of tasks for the purpose of 
capturing individual differences in emotion perception and emo- 
tion recognition. The strategy in developing the present set of 
tasks was to sample measures established in experimental psy- 
chology and to adapt them for psychometric purposes. It is 
important to note that the predominant conceptual model in 
individual differences psychology presupposes effect indicators 
of common constructs. In these models, individual differences 



in indicators are caused by individual differences in at least one 
latent variable. Specific indicators in such models can be con- 
ceived as being sampled from a domain or range of tasks. Research 
relying on single indicators sample just one task from this domain 
and are therefore analogous to single case studies sampling just 
a single person. The virtue of sampling more than a single task 
is that further analysis of a variety of such measures allows 
abstracting not only from measurement error but also from task 
specificity. 

In elaboration of this sampling concept we defined the domain 
from which we were sampling tasks a priori. Although general 
principles of sampling tasks from a domain have been specified, 
implicitly by Campbell and Fiske (1959) and Cattell (1961) and 
more explicitly by Little et al. (1999), the precise definition of 
"domain" is still opaque. In the present context, we applied a 
first distinction, which is well established in research on indi- 
vidual differences in cognitive abilities, namely between speed 
and accuracy tasks. A second distinction is based on the cogni- 
tive demand (perception vs. recognition) of an indicator. A speed 
task is defined as being so simple that members of the applica- 
tion population complete all tasks correctly if given unlimited 
time. An accuracy task is defined as being so hard that a substan- 
tial proportion of the application population cannot complete it 
correctly even if given unlimited time. We expect that once the 
guidelines and criteria suggested for tasks in the introduction are 
met and the following classifications of demands are applied — (a) 
primarily assessing emotion perception or emotion recognition 
and (b) provoking behavior that can be analyzed by focusing on 
either speed or accuracy, no further substantial determinants of 
individual differences can be established. Therefore, we anticipate 
that the expected diversity (Little et al, 1999) in the four domains 
distinguished here is low. This statement might seem very bold 
but we need to derive and test such general statements in order 
to avoid mixing up highly specific indicators with very general 
constructs. Obviously, this statement about task sampling applies 
to measures already developed (some of which were discussed in 
the introduction) and to measures still to be developed. A broad 
selection of tasks can be seen as a prerequisite to firmly establish 
the structure of individual differences in a domain under investi- 
gation. This is vividly visible in review work on cognitive abilities 
(Carroll, 1993). 

It is not yet sufficiently clear how the arguments concern- 
ing task sampling apply to prior work on individual differences 
in socio-emotional abilities. Contemporary theories of socio- 
emotional abilities (e.g., Zeidner et al., 2008) clearly place the 
abilities investigated here in the realm of emotional intelligence. 
For example, the currently most prominent theory of emotional 
intelligence — the four-branch model by Mayer et al. (1999) — 
includes emotion perception as a key factor. It is important to 
note that many of the prevalent measures of emotional intelli- 
gence, which rely on self-report of typical behavior, do not meet 
standards that should be applied to cognitive ability measures 
(Wilhelm, 2005). Assessment tools of emotional intelligence, 
which utilize maximum effort performance in ability measures, 
meet some but not all of these criteria (Davies et al., 1998; Roberts 
et al., 2001). Arguably, these standards are met by the tasks 
discussed in the theoretical section and newly presented here. 
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A convergent validation of the tasks presented here with popular 
measures of emotional intelligence — such as emotion percep- 
tion measures in the MSCEIT (Mayer et al, 1999; MacCann and 
Roberts, 2008) — is therefore not as decisive as it usually is. 

FUTURE RESEARCH DIRECTIONS 

This paper presents a compilation of assessment tools. Their 
purpose is to allow a psychologically based and psychometri- 
cally sound measurement of highly important receptive socio- 
emotional abilities, that is, the perception and recognition of 
facial expressions of emotions. We think that the present com- 
pilation has a variety of advantages over available measures and 
we stressed these advantages at several places in this paper. The 
goal was to provide a sufficiently detailed description of the 
experimental procedures for each task, to provide difficulty and 
reliability estimates for tasks and emotion specific subscales cre- 
ated within tasks. In further research we wiU consider the factorial 
structure across tasks and investigate competing measurement 
models of emotion perception and recognition — applying theo- 
retically important distinctions between speed vs. accuracy and 
perception vs. recognition. An essential and indispensable step for 
such further research is the close inspection of the psychometric 
quality of each task. 

The application populations of the present tasks are older ado- 
lescents and adults and task difficulty was shown to be somewhere 
between adequate and optimal for younger adults included in the 
present sample. With some adaptations the tasks can be applied 
to other populations too. The goal of our research is met best, 
when these tools (and adaptations or variants of them) are fre- 
quently used in many different research fields. We will briefly 
present some research directions we currently pursue in order to 
illustrate potential uses of our battery. 

One question we are currently investigating is the distinc- 
tion between the perception and recognition of unfamiliar 
neutral faces (Herzmann et al, 2008; Wilhelm et al, 2010) 
and the perception and recognition of unfamiliar emotional 
faces. Investigating such questions with psychometric methods is 
important in order to provide evidence that facial emotion recep- 
tion is a determinant of specific individual differences. In elabo- 
ration of this research branch we also study individual differences 
in posing emotional expressions (Olderbak et al, 2013). 

Obviously, the measures presented in this paper are also a 
promising approach when studying group differences — for exam- 
ple when studying differences between psychopathic and unim- 
paired participants (Marsh and Blair, 2008). Establishing deficient 
socio-emotional (interpersonal) abilities as key components of 
mental disorders hinges upon a solid measurement in many cases. 
We hope that the present contribution helps to provide such 
measurements. 

Finally, we want to mention two important restrictions of 
the face stimuli used in the present tasks. First, the stimuli are 
exclusively portraits of white young adult middle-Europeans. 
The "Own-Race bias" (Meissner and Brigham, 2001) is a well- 
established effect in identity processing and it also applies to tasks 
capturing emotion perception (Elfenbein and Ambady, 2002a). It 
would therefore be adequate to create additional stimuli showing 
subjects of other ethnicity. Second, faces were presented without 



any context information, which — in most cases — enhances the 
reliability of their expressed (emotional) meanings (cf Walla and 
Panksepp, 2013). Obviously, the tasks presented here could be 
used with stimulus sets varying in origin, ethnicity, color, age 
etc., and most of them could be extended to stimuli including 
varying contexts. Software code and more detail concerning the 
experimental setup and task design are available from the authors 
upon request. 
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